Why some Mats don't need to be uploaded when using GPU?


I’m a newbie researching and experimenting with using OpenCV’s GPU interface. I notice that some Mat arguments to cv::cuda routines need to be uploaded to GPU while some others don’t. Why is that?

In cv::cuda::warpPerspective() for example, the first 2 arguments (the input and output images) need to be uploaded to the GPU. But the 3rd argument (the homography Mat) need not be uploaded.

Can anyone explain this nuance? How do the GPU routines handle the Mats that aren’t uploaded?


Using your example:

cv::cuda::warpPerspective(InputArray src, OutputArray dst, InputArray M, Size dsize, int flags = INTER_LINEAR,
    int borderMode = BORDER_CONSTANT, Scalar borderValue = Scalar(), Stream& stream = Stream::Null())

In a nutshell src and dst are the data you are going to work on whereas M is just a collection of function arguments/parameters, in the same way that dsize, flags, borderMode and borderValue are.

Now I can’t find any official documentation in the CUDA programming guide to justify passing arguments in host rather than device memory so take the next paragraph with a pinch of salt.

It seems like the consensus is that there is an amount of latency say N ms in launching a kernel as this requires comunication between the host and the device. As the size of this communication is small it does not saturate the available bandwidth between the host and device, meaning there is room for a small amount of extra data to be sent at the same time without increasing N. Therefore a few extra parameters (function arguments in addition to kernel communication overhead) can be sent without any penalty and there would be no advantage to first copying your function arguments from the host to the device before launching the kernel. Conversly if you could pass the data as a host argument this would saturate the available bandwidth between the host and device and significantly increase N, which is what you can experiance when using managed memory.