What can the function in cv::cuda do?

I want to calculate the Discrete Fourier Transform(DFT) of several images and then calculate the conjugate product of two corresponding matrix. There are many images(more than 10000 per group), but each image size is very small(matrix with shape100×100). So I need to use opencv-cuda for efficiency. I want to process each image matrix in a thread. However, the function cv::cuda::dft is a host function and can not be called in device code.

I wonder whether the cv::cuda::dft is a function than can operate GpuMat like cv::dft operate cv::mat in cpu, or when using this function(cv::cuda::dft), it will run parallel.

Since each matrix is a small one, the parallel computing advantage of a single matrix is not outstanding. How can I do the task parallel in Gpu, do I need to write DFT function in device code myself?
:face_with_monocle:

Do you mean a device thread? If so forget it, that’s not how CUDA works.

You can’t, the GPU is data parallel.

I would start by timing the execution on a single small matrix, then see what the scaling is like when you increase the number of matricies you process one after the other, you may get some kernel overlap. If your matrices are on the CPU then you would need to efficiently copy them to the GPU before hand or while processing so their is no delay from the memory transfer.

Additionally I would investigate whether their is any batch based DFT functions in the CUDA librarires.

Yeah, definitely not a task for OpenCV but for bare CUDA or its associated libraries. DFT is a basic building block for a lot of algorithms. They will have something you can use.