What can the function in cv::cuda do?

six · March 15, 2024, 2:21am

I want to calculate the Discrete Fourier Transform(DFT) of several images and then calculate the conjugate product of two corresponding matrix. There are many images(more than 10000 per group), but each image size is very small(matrix with shape100×100). So I need to use opencv-cuda for efficiency. I want to process each image matrix in a thread. However, the function cv::cuda::dft is a host function and can not be called in device code.

I wonder whether the cv::cuda::dft is a function than can operate GpuMat like cv::dft operate cv::mat in cpu, or when using this function(cv::cuda::dft), it will run parallel.

Since each matrix is a small one, the parallel computing advantage of a single matrix is not outstanding. How can I do the task parallel in Gpu, do I need to write DFT function in device code myself?

cudawarped · March 15, 2024, 5:35am

Do you mean a device thread? If so forget it, that’s not how CUDA works.

You can’t, the GPU is data parallel.

I would start by timing the execution on a single small matrix, then see what the scaling is like when you increase the number of matricies you process one after the other, you may get some kernel overlap. If your matrices are on the CPU then you would need to efficiently copy them to the GPU before hand or while processing so their is no delay from the memory transfer.

Additionally I would investigate whether their is any batch based DFT functions in the CUDA librarires.

crackwitz · March 15, 2024, 7:41am

Yeah, definitely not a task for OpenCV but for bare CUDA or its associated libraries. DFT is a basic building block for a lot of algorithms. They will have something you can use.

six · May 29, 2024, 1:15pm

Thanks for your helpful reply.
I find a batch based DFT functions named cufftPlanMany in CUDA libraries. This method is not used because I think assignning multiple tasks to the single stream and these streams run parallelly. However, time cost is larger than expected. Any suggestion on how to find the reason or improve the strategy？

// This is the main part codes
for (size_t i = 0;i < cudaStreams.size();++i) 
{
fun_single_stream(mat1_ls[i],mat2_ls[i],cudaStream[i]);
}
void fun_single_stream(cv::cuda::GpuMat& img_mat1,cv::cuda::GpuMat& img_mat2,cv::cuda::Stream stream_local)
{
cv::cuda::dft(mat1,  tmp_gpu_mat1 regionSize, cv::DFT_SCALE, stream_local);
cv::cuda::dft(mat2, tmp_gpu_mat2, regionSize, cv::DFT_SCALE, stream_local);
cv::cuda::mulSpectrums(tmp_gpu_mat1, tmp_gpu_mat2, tmp_gpu_mat3, cv::DFT_COMPLEX_OUTPUT, true, stream_local);
cv::cuda::dft(tmp_gpu_mat1, tmp_gpu_mat2, regionSize, cv::DFT_REAL_OUTPUT | cv::DFT_INVERSE | cv::DFT_COMPLEX_INPUT,stream_local);
...
}

Topic		Replies	Views
Using cv::Mat and/or cv::cuda::Mat with CUDA written custom code	10	2834	December 7, 2020
Any OpenCV Cuda function have similar used for cv::distanceTransform? C++ cuda , imgproc	4	405	August 14, 2023
cv::Matx in cuda kernel C++ cuda , core	6	986	November 4, 2022
Why OpenCV cuda function the execution time will be inconsistent? C++ cuda , imgproc	10	1155	August 11, 2023
How do you debug your algorithms using cv:cuda-functions? C++ cuda	0	288	June 12, 2021

What can the function in cv::cuda do?

Related topics