Orin assertion deviceSupports(SHARED_ATOMICS) in cudaimgproc

I have a code which fails only on the Orin(Cuda 11.4, gcc-11.5.0) board with this message:

terminate called after throwing an instance of 'cv::Exception'
  what():  OpenCV(4.10.0-dev) ~/opencv_contrib/modules/cudaimgproc/src/canny.cpp:140: error: (-215:Assertion failed) deviceSupports(SHARED_ATOMICS) in function 'detect'

and works on my older Jetson TX2 (OpenCV version <4.10.0, but I checked this deviceSupports(…) function is there too).

I guess this 2023 year device is definitely supports these atomics, but function deviceSupports(…) return wrong result for some reason.

Which settings should I check to let this function return ‘true’?

If you don’t have a debug version then I guess its some classic printf debugging. What’s the output from

  const int devId = getDevice();
  DeviceInfo dev(devId);
  std::cout << "Device Id: " << devId << std::endl;
  int version = dev.majorVersion() * 10 + dev.minorVersion();
  std::cout << "Major: " << dev.majorVersion() << ", Minor: " << dev.minorVersion() << ", Version: " << version << std::endl;
  std::cout << "Built with shared atomics: " << TargetArchs::builtWith(SHARED_ATOMICS) << std::endl;

What’s your CMake configuration output, specifically the CUDA part which specifies the selected architectures?

Your code print:

Device Id: 0
Major: 8, Minor: 7, Version: 87
Built with shared atomics: 0

My CMake:

set(CUDA_VERSION 11.4)
set(CMAKE_CUDA_ARCHITECTURES 87)
message(STATUS "CUDA architecture is set to " ${CMAKE_CUDA_ARCHITECTURES} )

Prints 87

jtop shows the same:
L4T: 35.3.1
Jetpack: 5.1.1
CUDA Arch BIN: 8.7
CUDA: 11.4.315

Do you mean I need to recompile OpenCV using debugging option?

No, I meant if you didn’t have a version we could debug, but the output above was sufficient.

The CMake configuration output for CUDA should similar to

NVIDIA CUDA: YES (ver 12.5, CUFFT CUBLAS NVCUVID NVCUVENC)
NVIDIA GPU arch: 50 52 53 60 61 62 70 72 75 80 86 87 89 90
NVIDIA PTX archs: 90

cuDNN: YES (ver 9.2.0)

Can you post your values for the following variables in CMakeVars.txt in your build folder?

OPENCV_CUDA_ARCH_BIN= 86
OPENCV_CUDA_ARCH_FEATURES= 86 86
OPENCV_CUDA_ARCH_PTX= 86

How did you specify the architecture to use when building OpenCV,
CUDA_ARCH_BIN, CUDA_ARCH_PTX, CMAKE_CUDA_ARCHITECTURES
or CUDA_GENERATION?

Thank you for the variables names in the settings. I’ve check PC OpenCV cmake out, it was:

NVIDIA CUDA:                   YES (ver 12.1, CUFFT CUBLAS)
    NVIDIA GPU arch:             50 52 60 61 70 75 80 86 89 90
    NVIDIA PTX archs:

So PTX was empty. While on Orin both PTX and ARCH was empty:

NVIDIA CUDA:                   YES (ver 11.4, CUFFT CUBLAS)
   NVIDIA GPU arch:
   NVIDIA PTX archs:

Probably something changed in ocv distrib and now I have to set these values manually while I never did this before. Since Orin is headless, I’ve added
-DCUDA_ARCH_BIN=8.6 -DCUDA_ARCH_PTX=8.6
and rebuild ocv (btw switched to gcc-10 since gcc-11 gives error). Now your code prints:

Built with shared atomics: 1

And my project cmake prints:

- CUDA_VERSION: 11.4
- CMAKE_CUDA_ARCHITECTURES: 87
- CUDA_ARCH_BIN: 86
- CUDA_ARCH_PTX: 86
- CUDA_GENERATION: 

And now everything works fine.
I am very appreciated for your quick response.

1 Like

I think the default OpenCV architecture search may be disabled on Jetson boards but I’ll need to check.

I would compile for your compute arch only by using -DCUDA_ARCH_BIN=8.7. This will reduce the size of your binary and avoid PTX JIT compilation.

I recompiled OpenCV and my app using -DCUDA_ARCH_BIN=87 (PTX was auto detected as 86). It works fine but I see no changes in my app size ~18MB. By the way, on TX2 it is just 1.8 MB. Could you tell me why it is ~10 times bigger on Orin?
If I use gcc-11 instead of gcc-10 on Orin board app size is ~13MB.

I have no static linking, although I’d like to have (to let application work on any board without OCV and CUDA installation) - is it possible?
I tried once -DBUILD_SHARED_LIBS=OFF but it lead to the app size 130 MB (it is ok) and the next level of error messages when running (it is not ok). Should I create another topic about static linking (if this goal to use application without any CUDA and OCV installation is even possible) or it is ok to discuss it here? Perhaps you have a link to a detailed explanation with examples?

That’s strange did you clean your build directory?

I was refering to the OpenCV shared library[s] (libopencv_cudaimgproc.so.4.10.0 etc.) which you link to in your application.

I don’t know, maybe the default compile options are for minimum size on one set up and for speed on another.

I’m not sure. If you compile with CUDA as a first class language this might be possible because it has access to static targets but I would have to check.

Do you mean -DBUILD_SHARED_LIBS=OFF? I would start a new thread if you have errors when building a static library.

did you clean your build directory?

I called make clean first. Is it enough?
I used -DCUDA_ARCH_BIN=87 and then OCV output was:

-- CUDA detected: 11.4
-- CUDA: Using CUDA_ARCH_BIN=87
-- CUDA: NVCC target flags -gencode;arch=compute_87,code=sm_87;-D_FORCE_INLINES;-gencode;arch=compute_86,code=compute_86
...
NVIDIA CUDA:                   YES (ver 11.4, CUFFT CUBLAS)
--     NVIDIA GPU arch:             87
--     NVIDIA PTX archs:            86

Then /usr/local/lib/libopencv_cudaimgproc.so.4.10.0 size is 9485872.

I would have thought so but I would just change the build directory name to gurantee a fresh configuration, you can then just delete it once you have examined the CMake configuration output.

just change the build directory name to gurantee a fresh configuration

Yes, you right:

cd ~/opencv/build2/ && cmake .. -DOPENCV_EXTRA_MODULES_PATH=~/opencv_contrib/modules/ -DCMAKE_PROGRAM_PATH=/usr/lib/ccache -DCUDA_ARCH_BIN=87 -DBUILD_SHARED_LIBS=ON ...
...
CUDA detected: 11.4
-- CUDA: Using CUDA_ARCH_BIN=87
-- CUDA: NVCC target flags -gencode;arch=compute_87,code=sm_87;-D_FORCE_INLINES
...
NVIDIA CUDA:                   YES (ver 11.4, CUFFT CUBLAS)
--     NVIDIA GPU arch:             87
--     NVIDIA PTX archs:

So it looks like make clean is not enough, so I have to delete build directory and rebuild OCV again.

While your in the middle of this can you quickly test what happens if you don’t specify
CUDA_ARCH_BIN in a fresh build directory, does still have blank entries for GPU/PTX arch?

I would include -DOPENCV_CMAKE_CUDA_DEBUG=ONso you get the output of all the architectures it tried to build against and why they failed if they did.

-- CUDA detected: 11.4
CMake Warning at cmake/OpenCVDetectCUDAUtils.cmake:213 (message):
  COMMAND:
  /usr/local/cuda/bin/nvcc;-ccbin;/usr/bin/aarch64-linux-gnu-g++-10;/home/vit/opencv/cmake/checks/OpenCVDetectCudaArch.cu;--run
Call Stack (most recent call first):
  cmake/OpenCVDetectCUDAUtils.cmake:267 (ocv_detect_native_cuda_arch)
  cmake/OpenCVDetectCUDA.cmake:76 (ocv_set_cuda_arch_bin_and_ptx)
  cmake/OpenCVFindLibsPerf.cmake:46 (include)
  CMakeLists.txt:836 (include)

-- Result: 0
-- Out: 8.7
-- Err: 
-- CUDA: NVCC target flags -gencode;arch=compute_87,code=sm_87;-D_FORCE_INLINES
...
--   NVIDIA CUDA:                   YES (ver 11.4, CUFFT CUBLAS)
--     NVIDIA GPU arch:             87
--     NVIDIA PTX archs:

By the way:

CMake Warning (dev) at CMakeLists.txt:127 (enable_language):
  project() should be called prior to this enable_language() call.
1 Like