Blur is not a member of cv::cuda

I am using opencv 3.4.9 I built it using CUDA. It works well.

Now I am try to use cv::cuda::blur function I get blur is not a member of cv::cuda.

cv::cuda::threshold works but cv::cuda::blur gives me the error.

I suspect I am missing a header file but can’t seem to find the right one.

On a similar not if I take 2 cv:cuda:GpuMat and add, subtract or multiple them it tells me the I can’t seem to do that, This also seems like an include issue.

Thank you for the help

blur() is a box filter

please show

This the code fragment for blur and the headers I included.

#include <opencv2/core.hpp>
#include <opencv2/core/cuda.hpp>
#include <opencv2/highgui.hpp>
#include <opencv2/opencv.hpp>
#include <opencv2/cudaimgproc.hpp>
#include <opencv2/cudaarithm.hpp>
#include <opencv2/cudafeatures2d.hpp>

	cv::cuda::GpuMat inputImageGPU(inputImage);
	cv::cuda::GpuMat sqImageGPU(sqImage);
	cv::cuda::GpuMat blurredSqImageGPU(blurredSqImage);
	cv::cuda::GpuMat blurredImageGPU(blurredImage);

	cv::cuda::blur(inputImageGPU, blurredImageGPU, _objectFindingWindowSize, defaultPoint, cv::BorderTypes::BORDER_DEFAULT);

	cv::cuda::blur(sqImageGPU, blurredSqImageGPU, _objectFindingWindowSize, defaultPoint, cv::BorderTypes::BORDER_DEFAULT);

then later in the code I do this and it tells be the + is not valid. All are GpuMat. Similar issue with - and *.

	cv::cuda::GpuMat  threshAndBlurredImage = threshImage + blurredImageGPU;

Adding the following did not help.

#include <opencv2/cudafilters.hpp>

those ‘operators’ are using cv::MatExpr and are only defined for CPU Mat.


That still leaves me with…

	cv::cuda::blur(inputImageGPU, blurredImageGPU, _objectFindingWindowSize, defaultPoint, cv::BorderTypes::BORDER_DEFAULT);

	cv::cuda::blur(sqImageGPU, blurredSqImageGPU, _objectFindingWindowSize, defaultPoint, cv::BorderTypes::BORDER_DEFAULT);

I am moving the code from using cv::blur to cv::cuda::blur. I am am still relatively new to opencv so your comment about the box filter does not mean anything to me.

afaik, there is no cv::cuda::blur (where did you find this ?)
so im proposing equivalent functionality

I saw a reference on web.

1 Like

So I tried this…

	Ptr<Filter>blurfilter = cv::cuda::createBoxFilter(inputImageGPU.type(), inputImageGPU.type(), _objectFindingWindowSize, defaultPoint, cv::BorderTypes::BORDER_DEFAULT);

	blurfilter->apply(inputImageGPU, blurredImageGPU);
	blurfilter->apply(sqImageGPU, blurredSqImageGPU);

The code works but the filtering is very very slow much slower then the old calls to cv::blur. Any suggestions would be appreciated.

Adds over a half second per frame. I have an RTX 2070 and the main CPU is AMD Ryzen 7 3700X 8 core 4050Mhz.

I am missing something the BoxFilter should not be a order of magnitude slower than cv::blur.

1 Like

are you measuring a single iteration of this ?
(kernels need to be compiled, caches warmed, etc)

are there more gpu ops in your pipeline ?
(up/downloading between cpu/gpu is expensive)

This link was original…

In there they used the old name…
cv::gpu::blur(gpuImg0, gpuImage0Blurred, cv::Size(7, 7), cv::Point(-1, -1), stream);

This is some more details…
The non-CUDA version of cv::blur took ~5ms
The CUDA version (blurfilter->apply) took ~450 ms.

The images are 720x1280 CV_32FC1

how exactly do you measure this?

a “simple call” to a CUDA function (that runs on the GPU) is not comparable to a call to code that runs on the CPU.

The post refers to a gpu which is 3 generations older than yours which is back of the envelope nearly 7 times slower. If you read the comments below someone with a faster gpu 770 achieved faster times and proposed that the memory bandwidth was the issue. Whilst I am not convinced that it is just the memory bandwidth in his case I would suggest it is the GPU performance.

Anyway to put that post in context using his timings ~12ms you can see that unless something in the codebase has changed for the worse ~450ms is way out (even if the image type is different), that is with a 7 times faster gpu and a possibly 9 times smaller image.

As @crackwitz mentioned and the post you linked to (first run 1.7 secs vs 12 ms) you are timing the first run (one time only cost) on the gpu where initialization including the creation of the cuda context happens. This is always orders of magnitude slower than subsiquent operations. Additionaly if you pass an empty GpuMat as the destination that memory will also get allocated during the call slowing things down even more.

Everyone thankyou for the ideas. All are appreciated.

The timing I reported was not on the first iterations and did not include the GpuMat allocation only the running of the filter…

I was looking for what the closest equivalent to the “simple blur” clearly the box filter is not.

The test data is the same data on GPU vs CPU. I wanted apple to apples comparison.,

I know the older link I found was wrong. but I was hoping for similar results with 3.4.9.

That is really strange. Have you built opencv with the performance tests?

I just checked on the perf test which uses a 7x7 boxfilter on a 1280x1024 32FC1 image

opencv_perf_cudafilters.exe --gtest_filter=Sz_Type_KernelSz_Blur.Blur/17

and the output was

[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from Sz_Type_KernelSz_Blur
[ RUN ] Sz_Type_KernelSz_Blur.Blur/17, where GetParam() = (1280x1024, 32FC1, 7)
[ PERFSTAT ] (samples=100 mean=0.92 median=0.93 min=0.86 stddev=0.04 (4.8%))
[ OK ] Sz_Type_KernelSz_Blur.Blur/17 (185 ms)
[----------] 1 test from Sz_Type_KernelSz_Blur (188 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (190 ms total)
[ PASSED ] 1 test.

This takes 0.92 ms on a GTX 1060. Can you compare the times you get for this to check they are quicker?

1 Like