Cuda safe call error

  • OpenCV =>4.5.3pre
  • Operating System / Platform =>jetson agx xavier
  • Compiler =>python 3.6.9 on linux
Detailed description

While I run this code “dispcu = sgm.compute(leftImgcu, rightImgcu)”, an exception has occurred:
OpenCV(4.5.3-pre) /home/dev/data/opencv4.5.3pre/opencv/opencv_contrib/modules/cudastereo/src/cuda/ error: (-217:Gpu API call) no kernel image is available for execution on the device in function ‘median_filter’

Then, I find all the exception has met for the cuda code like “CV_CUDEV_SAFE_CALL(cudaGetLastError());” and “cudaSafeCall( cudaGetLastError() );” . If I comment these lines while manual compiling the source opencv folder, there is no error encounted, but the disparity map is all zero.

The stereocuda code works fine in GeForce RTX 3090, I only meet the problem on jetson agx xavier platform. Can anyone give me some instructions?

Steps to reproduce
imgLr0, imgRr0, Q = stereo_rectify(imgL, imgR)
sgm=cv2.cuda.createStereoSGM(minDisparity = 0, numDisparities = 128, P1 = 10, P2 = 100, uniquenessRatio = 5, mode = 1)
dispcu = sgm.compute(leftImgcu, rightImgcu)

Can you access any CUDA functions on your jetson except initialization/upload/dload of GpuMat()? Did you compile on the jetson for compute capability 7.2 or copy across the binaries which worked on the 3090 machine and may have only been compiled for compute capability 8.6?

Thanks for your kind reply. I’ve tried some apply and compute operation for cuda, all meet the problem of “no kernel image is available for execution on the device in function…”, I’ve compiled the opencv on jetson and specifine the “-DCUDA_ARCH_BIN=7.2”. Maybe I should try lower opencv version?

this is the command I used to compile opencv:

I am not sure what is causing the issue, from memory this is usually a result of not having binary compatible device code, however your CMake input should have generated binary device code for your card.

If it was me I would just confirm from the CMake output that OpenCv has been compiled for the specified arch, e.g.

  NVIDIA CUDA:                   YES (ver 11.3, CUFFT CUBLAS NVCUVID FAST_MATH)
    NVIDIA GPU arch:             72
    NVIDIA PTX archs:            

or you could use DeviceInfo::isCompatible() which should do the same thing but with the added benefit that it will double check the compute capability of your device.

If that still fails I would check that your driver version is compatible with the version of the CUDA toolkit you are compiling with. This can be checked from C++ with

int driverVersion = 0, runtimeVersion = 0;
cudaError_t e = cudaDriverGetVersion(&driverVersion);
e = cudaRuntimeGetVersion(&runtimeVersion);
if (runtimeVersion > driverVersion) 
 cout << "Update driver to a version which supports CUDA: " << runtimeVersion << endl;

or alternatively by manually comparing your driver version to the one required by the version of CUDA you are compiling against.

I would not compile binary code for a lower compute capability, however it may be worth generating ptx for say compute capability 6.0 (-DCUDA_ARCH_PTX=6.0 which will be JIT compiled to 7.2) to see if that solves your problem.

I find my cmake output is:
– NVIDIA GPU arch: 70

the NVIDIA CUDA miss NVCUVID and FAST_MATH, even if I turned these option on while cmake, the output still less them, can you tell me how to add them to compile? thank you~

I wouldn’t worry about that. The main thing is you have compiled for 70 and you need to compile for 72. I am not sure why since CUDA 10.2 should support compute capability up to 7.5 you are passing -DCUDA_ARCH_BIN=7.2. It may be a bug in OpenCV preventing compute capability 7.2 on CUDA 10.2 or some other miss configuration.

First I would confirm that passing -DCUDA_ARCH_BIN=7.2 results in

and not

– NVIDIA GPU arch: 72

as that should take a minute. I would use a fresh build directory to ensure none of the previous options stick.

If that doesn’t work I would try passing -DCUDA_ARCH_PTX=7.0 if the

NVIDIA PTX archs:  

is blank in your CMake output. That should force the ptx code to be JIT compiled to compute capability 7.2 code on the fly before your program executes.

If that works it might be worth upgrading your version of CUDA or digging into the CMake rules to see why compute capability 7.2 is not being selected.