If you want to know what the address returned when GpuMat isn’t empty points to and why you can’t use it directly, then on the first point I would be guessing but I assume it is an address in the CUDA address space which all CUDA API functions can map to a physical address on the given device, with the second point following from that. I’m sure someone else can put it much more succinctly than I have.
“an address in the CUDA address space” would mean an address in the device memory right?
My next question would be: to work with these values I need to work with CUDA programming? (not opencv strictly) and somehow later transfer it to host memory…
I wonder if there is some sample code somewhere…
The only way I know of to work with the values directly is by using the CUDA API.
It depends what you want to do, if you want to perform an operation which is implemented in OpenCV for GpuMat then I would recommend you work CUDA addresses indirectly by using OpenCV which is a wrapper on top of the CUDA API when working with GpuMat.
It might be easier if you explain what you want to do?
I found some example similar to what I think I need to do (completely different operation though) in here. I haven’t finished analyzing it yet, but it seems that a device operation is performed on a GpuMat.
Curiosly a GpuMat is passed to a cv::cuda::PtrStepSzf argument. I wonder what that is…
What I want to do is, I have a function (written by someone else) that works directly with cv::Mat and I would like to do a CUDA version of it using cuda::GpuMat
cv::cuda::PtrStepSzf is a useful wrapper because you don’t need to pass the GpuMat step as a separate argument and you can access elements on the device with bracketed notation. That is in the example instead of having to pass step and then accessing elements of dOutput as
dOutput[iRow * step/sizeof(*dOutput) + iCol]
you can use
dOutput(iRow, iCol)
If you want to implement a function in CUDA I would start there because that operation will be infinitely more difficult than accessing the memory.
GpuMat has upload and download methods. use them to copy data from host (main memory) to device (GPU memory) and back. do not mess around with that pointer. it’s an implementation detail that happens to be exposed but you don’t need it, unless you are going to use CUDA functions on the host/CPU side, and need to get at the cuda object that’s contained in the GpuMat (OpenCV thing).
I am very sure that you would benefit from working through this “tutorial”. it contains the basics of OpenCV’s CUDA modules but isn’t specifically made for the purpose.
and here’s how you’d write a kernel. a kernel is code that runs on the GPU. it’s called “kernel” because usually it’s the code of an “inner loop”, and CUDA takes care to throw it at every element of an array that lives in GPU memory. code running on the CPU can’t touch data on the GPU. code on the GPU can’t (easily) touch host memory. some piece of code can’t “pick” where it runs, nobody can. it’s written for the host or for the GPU, and it only runs where it’s written to run.
an introduction to CUDA or GPU programming in general might be a good idea, if any of those statements surprised you. there’s a ton more to know (streams at least) to make actual use of a GPU.
I’m not claiming to be correct in every aspect but the gist of it should be right.
I have not been able to find much info on cv::cuda::PtrStepSzf except a couple of articles (one in japanese) that I am reading. They basically say what you wrote.
My question is, in terms of speed (performance) , is there some penalty to use the simpler dOutput(iRow, iCol) rather than a GpuMat and then ptr and step??
I am trying to get as much speed as I can get and though I would prefer to use PtrStepSzf, I cannot risk to lose speed.
Edit: Turns out I cannot use ptr inside a kernel, and I have to pass data and step directly. Just found out
I doubt it as nvcc will heavily optimize your code and the resulting instructions may even be the same in both cases. If you really want you can check by examining the resulting ptx.
I am not sure why you can’t use cv::cuda::PtrStepSzf because this is how device data is passed to “most” (at least from my quick search) of the CUDA kernels in OpenCV.