Why some Mats don't need to be uploaded when using GPU?

fijoy · May 9, 2022, 7:56pm

Hello,

I’m a newbie researching and experimenting with using OpenCV’s GPU interface. I notice that some Mat arguments to cv::cuda routines need to be uploaded to GPU while some others don’t. Why is that?

In cv::cuda::warpPerspective() for example, the first 2 arguments (the input and output images) need to be uploaded to the GPU. But the 3rd argument (the homography Mat) need not be uploaded.

Can anyone explain this nuance? How do the GPU routines handle the Mats that aren’t uploaded?

Thanks.

cudawarped · May 10, 2022, 9:08am

Using your example:

cv::cuda::warpPerspective(InputArray src, OutputArray dst, InputArray M, Size dsize, int flags = INTER_LINEAR,
    int borderMode = BORDER_CONSTANT, Scalar borderValue = Scalar(), Stream& stream = Stream::Null())

In a nutshell src and dst are the data you are going to work on whereas M is just a collection of function arguments/parameters, in the same way that dsize, flags, borderMode and borderValue are.

Now I can’t find any official documentation in the CUDA programming guide to justify passing arguments in host rather than device memory so take the next paragraph with a pinch of salt.

It seems like the consensus is that there is an amount of latency say N ms in launching a kernel as this requires comunication between the host and the device. As the size of this communication is small it does not saturate the available bandwidth between the host and device, meaning there is room for a small amount of extra data to be sent at the same time without increasing N. Therefore a few extra parameters (function arguments in addition to kernel communication overhead) can be sent without any penalty and there would be no advantage to first copying your function arguments from the host to the device before launching the kernel. Conversly if you could pass the data as a host argument this would saturate the available bandwidth between the host and device and significantly increase N, which is what you can experiance when using managed memory.

Topic		Replies	Views
Throw error when use cuda::warpAffine C++ cuda , imgproc	2	386	November 8, 2023
GPU Affine Warp C++ cuda , imgproc	2	970	February 4, 2022
Python CUDA GpuMat upload() function, strange warm-up required? Python cuda	1	872	September 12, 2023
Using cv::Mat and/or cv::cuda::Mat with CUDA written custom code	10	2808	December 7, 2020
cuda_GpuMat.upload() error doesn't make sense Python cuda	2	901	June 21, 2022

Why some Mats don't need to be uploaded when using GPU?

Related topics