Blur is not a member of cv::cuda

cudawarped · September 21, 2021, 9:13am

Appologies, of course I wasn’t considering the type of filter. In that case as you said it should scale fairly well, probably really fast for small filters as less transfer from global memory and then only slightly slower for large filters as the transfer increases to a maximum of 4 times the threadblock size and you have to pre-compute the integral image.

This operation (both with the naive and integral image approach) should be completely memory bound, meaning that small filters which fit in shared memory, requiring less transfer from global memory should be as quick using the naive approach as the integral image.

Looking at the trace for the npp functions I would be 99% sure they are naive implementations, firstly due to the name of the kernels:

ForEachPixelNaive<float, (int)1, FilterBoxReplicateBorder3x3SharedFunctor
ForEachPixelNaive<float, (int)1, FilterBoxReplicateBorder5x5SharedFunctor

for filter size 3 and 5 respectively, which as the name suggests are both using shared memory and therefore probably the classic approach which is faster for small filters and then for filters of size 7 and above

ForEachPixelNaive<float, (int)1, FilterBoxReplicateBorderFloatFunctor

which doesn’t use shared memory, and at a guess from the timings for large filters uses the naive approach with global memory reads for each operation.

Topic		Replies	Views
Some opencv cudafilter functions is slower than CPU code on Jetson Xavier NX C++ filter , cuda , cudaarithm	1	371	November 8, 2023
NVIDIA Xavier OpenCV built from source tests fail C++ build , cuda , nvidia , contrib	15	239	December 11, 2024
OpenCV CUDA extremely slow cuda	3	7186	April 30, 2021
Python CUDA GpuMat upload() function, strange warm-up required? Python cuda	1	1102	September 12, 2023
Blur() segfaults on CentOS Linux release 7.5 C++	6	597	April 12, 2021

Blur is not a member of cv::cuda

Related topics