I am using opencv 3.4.9 I built it using CUDA. It works well.
Now I am try to use cv::cuda::blur function I get blur is not a member of cv::cuda.
cv::cuda::threshold works but cv::cuda::blur gives me the error.
I suspect I am missing a header file but can’t seem to find the right one.
On a similar not if I take 2 cv:cuda:GpuMat and add, subtract or multiple them it tells me the I can’t seem to do that, This also seems like an include issue.
I am moving the code from using cv::blur to cv::cuda::blur. I am am still relatively new to opencv so your comment about the box filter does not mean anything to me.
The post refers to a gpu which is 3 generations older than yours which is back of the envelope nearly 7 times slower. If you read the comments below someone with a faster gpu 770 achieved faster times and proposed that the memory bandwidth was the issue. Whilst I am not convinced that it is just the memory bandwidth in his case I would suggest it is the GPU performance.
Anyway to put that post in context using his timings ~12ms you can see that unless something in the codebase has changed for the worse ~450ms is way out (even if the image type is different), that is with a 7 times faster gpu and a possibly 9 times smaller image.
As @crackwitz mentioned and the post you linked to (first run 1.7 secs vs 12 ms) you are timing the first run (one time only cost) on the gpu where initialization including the creation of the cuda context happens. This is always orders of magnitude slower than subsiquent operations. Additionaly if you pass an empty GpuMat as the destination that memory will also get allocated during the call slowing things down even more.
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from Sz_Type_KernelSz_Blur
[ RUN ] Sz_Type_KernelSz_Blur.Blur/17, where GetParam() = (1280x1024, 32FC1, 7)
[ PERFSTAT ] (samples=100 mean=0.92 median=0.93 min=0.86 stddev=0.04 (4.8%))
[ OK ] Sz_Type_KernelSz_Blur.Blur/17 (185 ms)
[----------] 1 test from Sz_Type_KernelSz_Blur (188 ms total)
[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (190 ms total)
[ PASSED ] 1 test.
This takes 0.92 ms on a GTX 1060. Can you compare the times you get for this to check they are quicker?