Using cv::Mat and/or cv::cuda::Mat with CUDA written custom code

Hello, I need to implement some image processing and computer vision algorithms in CUDA. I have written some image processing and computer vision code in OpenCV but I never used CUDA. I need books or tutorials to show me how to use OpenCV’s image classes with CUDA. I mean how to pass OpenCV’s image classes to CUDA functions? How to read an OpenCV image class pixel by pixel in CUDA. Also what are the best practices when combining OpenCV and CUDA. Should I run a main C/C++ file and call some .cu files in that C/C++ file or run a .cu file and call C/C++ headers/implementations in .cu? I am completely new in this and opened to all advices. And new forum looks awesome by the way.

1 Like

I made sample project.
Please refer to cuda_impl branch of this project.


Thanks for the reply Dandelion. I am going to check the example you provided. Also I am still open to any other documented knowledge.

You can pass a GpuMat data pointer directly to a kernel while Mat memory has to be copied to device memory using the CudaMalloc, CudaMemcpy calls. You can use regular cpp file and call methods from the .cu. But there are few examples and a couple pitfalls there especially when dealing with larger chunks of memory which exceed the possibilities of a forum (I cannot write a book here). This is something we would have to discuss by example. If you are interested let me know.


Hello tmanthey, I’d like to know more. Also I am very disappointed that there are only few examples about how to use CUDA and OpenCV together. Especially for C++ language. It’s really very hard to find good OpenCV tutorials which use C++ these days. Most of the tutorials are in Python.

1 Like

There are good reasons for C++. We had to use Opencv and Cuda to actually port a Python tool to C++. We improved the speed 450x. Please contact me on my email

Hi lightbringer, I may be wrong but I think you are confusing CUDA and the OpenCV CUDA API.

  1. If you want to use CUDA and OpenCV together then you need to know how to program in CUDA, then the only extra piece of the puzzle is passing the pointer of your GpuMat to your kernel as @tmanthey suggested and as shown in the code provided by @dandelion1124, see here for the bit you need. If you are not familiar with CUDA then this may not be what you are looking for.

  2. If you want to use the OpenCV CUDA API then there are many many more examples in C++ than in python. The python API to cv::cuda was only released in Aug 2018 and hardly anyone uses it. For C++ every function in the cv::cuda namespace has a unit test written showing how to call it, e.g. cudaarithm.

From your initial question I would infer that you need to learn CUDA (1), however depending on the processing you need to do, you may find there is an OpenCV cv::cuda function or combination of functions which will achieve your goal. I hope this helps.

1 Like

Hello cudawarped. Actually I am not confusing CUDA, and the OpenCV CUDA API which is in the OpenCV Contrib but I might be confused about which one to use for my needs. I actually need both it seems. Because if there are cv::cuda functions which satisfy my needs I can use them but sometimes you need to write custom code which you can’t find in the original library or which you can’t create by using combinations of the library functions. For those kind of situations I should use CUDA and OpenCV together not OpenCV CUDA API. Am I right?

In MATLAB, there are GPU arrays, if you use those GPU arrays in some operations(list of those operations is limited though), those operations automatically run on the GPU. You don’t need to make a call to a .cu file or etc. Simply you can create a MATLAB function and if you pass your GPU array to that function, that function automatically runs on the GPU. Is there such a similar way in OpenCV?

Many thanks for your explanations.

Yes you are correct. When there are no cv::cuda functions which satisfy your needs you will need to write your own or use someone else’s CUDA kernels and use the approach given by dandelion1124. But as you used to have to do (I think it now vectorizes loops under the hood for you) with MATLAB where you would try to vectorize all your code as much as possible to take advantage of BLAS I would advise trying to use the OpenCV functions or another existing library’s functions instead of writing your own CUDA functions. That is because in general it is easy to write CUDA kernel’s but not as easy to write efficient ones (excluding something like a simple map) .

Unfortunately to my knowledge there is no function like that in OpenCV.

1 Like

Actually one of the main reason to write my own CUDA functions was to set number of threads, grids, number of cores etc. by myself and see the effect of those parameters and calculate speedup ratio and parallel efficiency. Is it possible to set those parameters or calculate the metrics I told, without using CUDA but just using OpenCV CUDA?

You could, but only by hard coding them in the OpenCV source and recompiling, which is less than ideal.

Even if you did that you would have to examine the kernel code first. Its been a while but off the top of my head you could easily run out of shared memory and/or violate an assumed size constraint (a lot of code will assume a thread blocks width is a multiple of warpsize). Furthermore, a lot of the cuda functions use npp under the hood which can have a substantial start up cost on each invocation so depending on how you profile your code you may not see the speedup.

I think writing your own CUDA functions as you suggested is a great idea to achieve what you want. Then you can easily integrate them into your OpenCV workflow.

1 Like