Python Opencv/cuda/Opengl/imshow #18553 #18588

I am attempting to output high resolution imagery in the form of a cv2.cuda_GpuMat() using python.cv2.imshow using a namedWindow flagged with cv2.WINDOW_OPENGL. This does not seem to be functional as the console reports that the feature has not been implemented and to explicitly download using cuda_GpuMat.download(); however, in researching this problem I came across:

Both indicating that cv2.imshow is able to process image mats directly from CUDA/GPU memory.

Am I crazy? If not what is it I’m missing here?

does this work from C++ (build a small C++ demo file and run it)? if yes, it’s a python bindings issue. if that doesn’t work either, maybe the feature itself doesn’t quite work.

I’m unable to do so, I’m trying to fix an issue on a GPU heavy production media system that’s running a 4K led-wall (and lagging downloading the frames to buffer on CPU. I don’t have the local resources to perform the test.

Perhaps I can find someone to run a quick test for me, before the client decides to murder me haha.

Actually looking at: opencv/highgui.hpp at 4.7.0 · opencv/opencv · GitHub I don’t see the changes mentioned in the enhancement present in this file

Alternatively this is interesting: opencv/highgui.hpp at 725e440d278aca07d35a5e8963ef990572b07316 · opencv/opencv · GitHub

As @crackwitz alluded to, this is a C++ not python issue. I raised an issue on this a while back see
imshow no longer works for cv::cuda::GpuMat · Issue #22328 · opencv/opencv · GitHub.

In short it looks like when OpenCV moved highgui to a plugin architechture the support for GpuMat was not implemented. This can be overcome in C++ by using

cv::imshow("window_name", cv::ogl::Texture2D(gpuMat));

however from a quick inspection the cv::ogl::Texture2D api is not exposed to python. Additionaly if Texture2D is wrapped (CV_EXPORTS_W) it can’t be passed to any OpenCV functions as they can’t deal with that type.

2 Likes

Aha! That makes perfect sense! I guess that puts me onto the search for an alternative method, unfortunately I haven’t been able to locate a different solution. Any ideas?

It depends, if your processing a single video stream in a single thread you could try using CUDA streams and displaying the output with a 1 frame delay as I demonstrated in the guide below

I was just reading about that, yeah it’s a single output stream going to an led wall

I’m using this in tandem with a Luxonis Depthai RGBD capture device (OAK-D pro)

Are you sure its the download which is taking the time? I would time each operation to see where the slow down is.

If its the decode then you could try using cv2.cudacodec.VideoReader().

I have isolated the delay into two locations, upload is hurting me but that’s torch, and cv is actually way faster, on the download side I’m losing out because I am downloading to CPU and then it’s resizing the image to the display, I have the option of uploading back to GPU but that’s just as bad, I want to convert the memory map from torch to cv and then pipe directly to the output without getting tangled up on the cpu

The really wild thing is if I use 2k it’s instant, like zero delay 30+fps but 4K is a brutal half a second delay and 7 fps

I also think it’s important to note that the host system in this case is an Nvidia AGX Orin so the behavior might be a little different.

I would like to thank you, your article was able to provide me with a series of solutions that overcame my bottleneck and satisfied my client. :handshake:

1 Like

Nice glad I could help!