Slow DFT with CUDA

AaronB · March 23, 2022, 1:40am

Non-cuda code runs much faster then CUDA code…

		cv::dft(sourceComplexImage, dft, cv::DftFlags::DFT_COMPLEX_INPUT, 0);

vs

		cv::cuda::Stream Stream;
		cv::cuda::GpuMat gpudft;
		cv::cuda::GpuMat gpusourceComplexImage(sourceComplexImage);
		cv::cuda::dft(gpusourceComplexImage, gpudft, size, cv::DftFlags::DFT_COMPLEX_INPUT, Stream);

		Stream.waitForCompletion();
		gpudft.download(dft);

Any thoughts on why I would expect the CUDA code to run much faster? The dft image size is 640x360.

GPU is RTX2060.

The slowness is in the DFT call not the transfer to or from the CPU.

Aaron

cudawarped · March 23, 2022, 10:13am

Looks like Nvida changed something in the most recent versions of CUDA, see

an d

When i profile the function the kernels are super quick however it spends >10ms in calls to cuModuleLoadData, cudModuleUnloadData etc.

You could try reverting to CUDA 11.0 to see if that improves your performance.

I just tested the performance test

opencv_perf_cudaarithm.exe --gtest_filter=Sz_Flags_Dft.Dft/0

compiled against CUDA 11.0 and 11.4. The performace on CUDA 11.0 was 25x better, results below:

CUDA 11.0
[ RUN ] Sz_Flags_Dft.Dft/0, where GetParam() = (1280x720, 0)
[ PERFSTAT ] (samples=25 mean=25.80 median=25.73 min=24.68 stddev=0.66 (2.5%))

CUDA 11.4
[ RUN ] Sz_Flags_Dft.Dft/0, where GetParam() = (1280x720, 0)
[ PERFSTAT ] (samples=13 mean=1.18 median=1.19 min=1.10 stddev=0.03 (2.9%))

This may not be enough to make the CUDA varient faster than the CPU one on your system with the sizes your using but it should bring them closer together.

Topic		Replies	Views
OpenCV CUDA extremely slow cuda	3	6769	April 30, 2021
Some opencv cudafilter functions is slower than CPU code on Jetson Xavier NX C++ filter , cuda , cudaarithm	1	314	November 8, 2023
CUDA Fast detector much slower than normal FAST performance , cuda , practical	9	2466	May 28, 2021
What can the function in cv::cuda do? C++ cuda , imgproc	3	258	May 29, 2024
Opencv cuda stream optimisation C++ cuda	1	1074	August 18, 2022

Slow DFT with CUDA

Related topics