Getting Errors after resetting CUDA device at runtime

Hi folks,

I’m developing an application that occasionally needs to jump from one GPU to another.

So what I do is, at the time of switching, I first run

cudaDeviceReset(); //from cuda API at cuda.h

as well as

cv::cuda::resetDevice();

to completely wipe out all the memory still held in the current_GPU. Then I set the new_GPU index with

cudaSetDevice(deviceIndex); // from cuda.h

and

cv::cuda::setDevice(deviceIndex);

But this does not work out very well. The application was able to migrate to another GPU but the memory at the old GPU was never cleared out.

If I remove

cv::cuda::resetDevice();

then the memory is completely cleared but when the function convertTo is called, it results in an error

  what():  OpenCV(4.8.1) /opt/opencv_contrib-4.8.1/modules/cudev/include/opencv2/cudev/grid/detail/transform.hpp:312: error: (-217:Gpu API call) invalid argument in function 'call'

I would appreciate any insight on how to resolve this problem or if there is a better way of doing this.

Thanks.

Unfortunately I only have a single GPU to test on but I don’t have any issues calling

cv::cuda::resetDevice();

followed by

cv::cuda::setDevice(deviceIndex);

Are you sure your not trying to access a resource you allocated on the first GPU on the second one?

As far as I observed from nvidia-smi API, there was no memory left in the previous GPU. Everything after the reset is on the new GPU.

A few other weird symptoms are:

  • Other functions including cv::cuda::cvtColor, cv::cuda::resize and GpuMat::copyTo seem unaffected.
  • The only erroneous function GpuMat::convertTodid work only once after reset.

Have you ever found a solution to this?