Hi, I’m testing unified memory, this is the first cuda program with opencv. The problem is that after the memory initialization and some cv::cuda computation (resize and cvtColor) I try to access from cpu to the buffer but it doesn’t change. below the code and output for check.
Code:
I upload also the output
the output: adresses of variables are different (don’t now if should be equal)
he size of Mat and gpuMat before are correct, but after for imgProc_n and imgRes_n aren’t.
I may be wrong (your naming convention is hard to follow) but it looks like you are comparing the address of the Mat/GpuMat objects with the address of the memory that they point to?
What are you trying to check with this? The addresses of the memory should all be the same as you initialized the Mat/GpuMat objects with them. I would think the more important check would be that the managed memory works for you.
Just out of interset why are you using managed memory instead of explicitly uploading/downloading? Personally I have never used it as it seems you could easily cause yourself a performance penalty by relying on the memory being where you want it when you want it.
the output of adresses is just to test, i check using dimensions of mat and gpuMat after, for example imgProc and imgProc_n(ormal) should both 512 cols and 1 channel, the result is correct for imgProc but not for it’s corresponding Mat for cpu (imgProc_n) . I want to operate with gpu and dont need to download Mat after the computations, as performance i noticed a great improvement from the previus upload/download. for instance to use managed memory shuld i add some flags for cuda compilation?
You haven’t included the code for the Mat’s, i.e. which function do you use to convert imgCuda_n to imgProc_n?
Additionaly:
Your using the wrong signature for cv::cuda::resize, it works because fx and fy are ignored when you pass size, see the docs.
You haven’t passed a stream so you don’t need cudaDeviceSynchronize();
The reason you may have seen a speed up when using managed memory is most likely because you have pre-allocated imgRes and imgProc, you could have done this without managed memory which would be much cleaner and less error prone.
This is true but you are passing cv::INTER_TAB_SIZE2 and cv::INTER_CUBIC as fx and fy which are ignored. If you want to pass the interpolation method cv::INTER_CUBIC you need to call
the output code show that imgProc_n and imgProc has different dimensions and sould not, the same for imgRes and imgRes_n.
those 4 variables are allocated to use shared memory but when i work with imgProc and imgRes (function posted opencvProcess) doesn’t make differences in associated variables when accessed from cpu.
Now I understand. You are expecting the dimensions of the Mat’s to change when you process the GpuMat’s? This won’t happen and has nothing to do with unified memory at all.
The procedure is:
You have a Mat (imgCuda_n) and a GpuMat (imgCuda) pointing at the same memory region (note try this with two Mat’s)
Internally because imgRes is not 512x512 imgRes will alocate new memory of the correct size and point to that on the call to
First check what happens with Mats. I think you have a fundamental misunderstanding of how objects work. You can’t alter one object and hope it automatically changes another just because they point at the same memory location.
To confirm if your aim is to have a Mat and a GpuMat point at the same memory location and be able to process the Mat in host functions and the GpuMat in device functions you will have a hard time. This can only work when the functions modify the memory in place and don’t modify any of the object properties, size type etc.