Blur is not a member of cv::cuda

AaronB · September 15, 2021, 1:33pm

I am using opencv 3.4.9 I built it using CUDA. It works well.

Now I am try to use cv::cuda::blur function I get blur is not a member of cv::cuda.

cv::cuda::threshold works but cv::cuda::blur gives me the error.

I suspect I am missing a header file but can’t seem to find the right one.

On a similar not if I take 2 cv:cuda:GpuMat and add, subtract or multiple them it tells me the I can’t seem to do that, This also seems like an include issue.

Thank you for the help

berak · September 15, 2021, 1:47pm

blur() is a box filter

please show

AaronB · September 15, 2021, 2:31pm

This the code fragment for blur and the headers I included.

#include <opencv2/core.hpp>
#include <opencv2/core/cuda.hpp>
#include <opencv2/highgui.hpp>
#include <opencv2/opencv.hpp>
#include <opencv2/cudaimgproc.hpp>
#include <opencv2/cudaarithm.hpp>
#include <opencv2/cudafeatures2d.hpp>

	cv::cuda::GpuMat inputImageGPU(inputImage);
	cv::cuda::GpuMat sqImageGPU(sqImage);
	cv::cuda::GpuMat blurredSqImageGPU(blurredSqImage);
	cv::cuda::GpuMat blurredImageGPU(blurredImage);

	cv::cuda::blur(inputImageGPU, blurredImageGPU, _objectFindingWindowSize, defaultPoint, cv::BorderTypes::BORDER_DEFAULT);

	cv::cuda::blur(sqImageGPU, blurredSqImageGPU, _objectFindingWindowSize, defaultPoint, cv::BorderTypes::BORDER_DEFAULT);

then later in the code I do this and it tells be the + is not valid. All are GpuMat. Similar issue with - and *.

	cv::cuda::GpuMat  threshAndBlurredImage = threshImage + blurredImageGPU;

AaronB · September 15, 2021, 3:08pm

Adding the following did not help.

#include <opencv2/cudafilters.hpp>

berak · September 15, 2021, 3:21pm

those ‘operators’ are using cv::MatExpr and are only defined for CPU Mat.

https://docs.opencv.org/master/d8/d34/group__cudaarithm__elem.html

AaronB · September 15, 2021, 3:33pm

Thanks.

That still leaves me with…

	cv::cuda::blur(inputImageGPU, blurredImageGPU, _objectFindingWindowSize, defaultPoint, cv::BorderTypes::BORDER_DEFAULT);

	cv::cuda::blur(sqImageGPU, blurredSqImageGPU, _objectFindingWindowSize, defaultPoint, cv::BorderTypes::BORDER_DEFAULT);

I am moving the code from using cv::blur to cv::cuda::blur. I am am still relatively new to opencv so your comment about the box filter does not mean anything to me.

berak · September 15, 2021, 3:41pm

afaik, there is no cv::cuda::blur (where did you find this ?)
so im proposing equivalent functionality

AaronB · September 15, 2021, 3:42pm

I saw a reference on web.

AaronB · September 15, 2021, 5:35pm

So I tried this…

	Ptr<Filter>blurfilter = cv::cuda::createBoxFilter(inputImageGPU.type(), inputImageGPU.type(), _objectFindingWindowSize, defaultPoint, cv::BorderTypes::BORDER_DEFAULT);

	blurfilter->apply(inputImageGPU, blurredImageGPU);
	blurfilter->apply(sqImageGPU, blurredSqImageGPU);

The code works but the filtering is very very slow much slower then the old calls to cv::blur. Any suggestions would be appreciated.

Adds over a half second per frame. I have an RTX 2070 and the main CPU is AMD Ryzen 7 3700X 8 core 4050Mhz.

I am missing something the BoxFilter should not be a order of magnitude slower than cv::blur.

berak · September 15, 2021, 7:36pm

are you measuring a single iteration of this ?
(kernels need to be compiled, caches warmed, etc)

are there more gpu ops in your pipeline ?
(up/downloading between cpu/gpu is expensive)

AaronB · September 16, 2021, 1:12am

This link was original…

In there they used the old name…
cv::gpu::blur(gpuImg0, gpuImage0Blurred, cv::Size(7, 7), cv::Point(-1, -1), stream);

AaronB · September 16, 2021, 2:03am

This is some more details…
The non-CUDA version of cv::blur took ~5ms
The CUDA version (blurfilter->apply) took ~450 ms.

The images are 720x1280 CV_32FC1

crackwitz · September 16, 2021, 7:53am

how exactly do you measure this?

a “simple call” to a CUDA function (that runs on the GPU) is not comparable to a call to code that runs on the CPU.

cudawarped · September 16, 2021, 8:40am

The post refers to a gpu which is 3 generations older than yours which is back of the envelope nearly 7 times slower. If you read the comments below someone with a faster gpu 770 achieved faster times and proposed that the memory bandwidth was the issue. Whilst I am not convinced that it is just the memory bandwidth in his case I would suggest it is the GPU performance.

Anyway to put that post in context using his timings ~12ms you can see that unless something in the codebase has changed for the worse ~450ms is way out (even if the image type is different), that is with a 7 times faster gpu and a possibly 9 times smaller image.

As @crackwitz mentioned and the post you linked to (first run 1.7 secs vs 12 ms) you are timing the first run (one time only cost) on the gpu where initialization including the creation of the cuda context happens. This is always orders of magnitude slower than subsiquent operations. Additionaly if you pass an empty GpuMat as the destination that memory will also get allocated during the call slowing things down even more.

AaronB · September 16, 2021, 9:54am

Everyone thankyou for the ideas. All are appreciated.

The timing I reported was not on the first iterations and did not include the GpuMat allocation only the running of the filter…

I was looking for what the closest equivalent to the “simple blur” clearly the box filter is not.

The test data is the same data on GPU vs CPU. I wanted apple to apples comparison.,

I know the older link I found was wrong. but I was hoping for similar results with 3.4.9.

cudawarped · September 16, 2021, 10:12am

That is really strange. Have you built opencv with the performance tests?

I just checked on the perf test which uses a 7x7 boxfilter on a 1280x1024 32FC1 image

opencv_perf_cudafilters.exe --gtest_filter=Sz_Type_KernelSz_Blur.Blur/17

and the output was

[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from Sz_Type_KernelSz_Blur
[ RUN ] Sz_Type_KernelSz_Blur.Blur/17, where GetParam() = (1280x1024, 32FC1, 7)
[ PERFSTAT ] (samples=100 mean=0.92 median=0.93 min=0.86 stddev=0.04 (4.8%))
[ OK ] Sz_Type_KernelSz_Blur.Blur/17 (185 ms)
[----------] 1 test from Sz_Type_KernelSz_Blur (188 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (190 ms total)
[ PASSED ] 1 test.

This takes 0.92 ms on a GTX 1060. Can you compare the times you get for this to check they are quicker?

AaronB · September 16, 2021, 5:17pm

So I rebuilt opencv and got the following…

[ RUN ] Sz_Type_KernelSz_Blur.Blur/6, where GetParam() = (1280x720, 32FC1, 3)
[ PERFSTAT ] (samples=38 mean=0.15 median=0.15 min=0.14 stddev=0.00 (2.2%))
[ OK ] Sz_Type_KernelSz_Blur.Blur/6 (35 ms)
[ RUN ] Sz_Type_KernelSz_Blur.Blur/7, where GetParam() = (1280x720, 32FC1, 5)
[ PERFSTAT ] (samples=13 mean=0.17 median=0.17 min=0.16 stddev=0.00 (1.1%))
[ OK ] Sz_Type_KernelSz_Blur.Blur/7 (31 ms)
[ RUN ] Sz_Type_KernelSz_Blur.Blur/8, where GetParam() = (1280x720, 32FC1, 7)
[ PERFSTAT ] (samples=13 mean=0.27 median=0.27 min=0.26 stddev=0.00 (1.0%))
[ OK ] Sz_Type_KernelSz_Blur.Blur/8 (32 ms)

So, it it something about the opencv library build.

cudawarped · September 16, 2021, 5:19pm

Cool, so is your code also quicker now as well?

AaronB · September 16, 2021, 5:28pm

I have to relink to the new library

AaronB · September 19, 2021, 9:00pm

I tested with a the opencv_perf_cudafilters and got really good numbers well below a ms.

But, when I link against those same libraries I get hundreds of ms to do the same blur.

Debugging

Topic		Replies	Views
Some opencv cudafilter functions is slower than CPU code on Jetson Xavier NX C++ filter , cuda , cudaarithm	1	320	November 8, 2023
NVIDIA Xavier OpenCV built from source tests fail C++ build , cuda , nvidia , contrib	15	122	December 11, 2024
OpenCV CUDA extremely slow cuda	3	6839	April 30, 2021
Python CUDA GpuMat upload() function, strange warm-up required? Python cuda	1	924	September 12, 2023
Blur() segfaults on CentOS Linux release 7.5 C++	6	547	April 12, 2021

Blur is not a member of cv::cuda

Related topics