I tried using the Gaussian Blur and Cuda accelerated Gaussian filter functions in Open4.7, using sigma for 5 and 50 Gaussian kernels, and using the Chrono library for timing.
auto start2 = std::chrono::high_resolution_clock::now();
cv::Mat gause1(img.size(),CV_32FC1);
cv::GaussianBlur(img,gause1,cv::Size(31,31),5,5,cv::BORDER_REPLICATE);
auto end2 = std::chrono::high_resolution_clock::now();
auto duration2 = std::chrono::duration_cast<std::chrono::microseconds >(end2-start2);
cout << "GaussianBlur elapsed time: " << duration2.count() << "us\n";
auto filter = cv::cuda::createGaussianFilter(CV_32FC1,CV_32FC1,cv::Size(31,31),5,5,cv::BORDER_REPLICATE);
auto start4 = std::chrono::high_resolution_clock::now();
cv::cuda::GpuMat src,dst;
src.upload(img);
filter->apply(src,dst);
cv::Mat gause1_gpu;
dst.download(gause1_gpu);
auto end4 = std::chrono::high_resolution_clock::now();
auto duration4 = std::chrono::duration_cast<std::chrono::microseconds >(end4-start4);
cout << "GaussianBlur_gpu elapsed time: " << duration4.count() << "us\n";
GaussianBlur elapsed time: 9793us
GaussianBlur_gpu elapsed time: 171327us
-
Why does CUDA acceleration take longer to run than on the CPU? I understand that cuda requires the use of events for timing, but if my algorithm involves cuda acceleration and I want to calculate the duration of the entire algorithm, should I use CPU time for timing?
-
Secondly, why does Cuda’s accelerated Gaussian blur limit the Gaussian kernel size to only be greater than 0 and less than or equal to 32? If the sigma of the Gaussian kernel is 50, its Gaussian kernel size should be around 331. Does this mean that Cuda cannot be used to accelerate Gaussian blur with a sigma exceeding 5?
-
Finally, if I want to apply Gaussian blur with sigma 5 and 70 to the same image, is there any way to reduce the runtime to 1-2ms? My picture is 240 * 340, CV_ 32FC1.