Persistent CUDA/cuDNN Build Errors (4.12.0, CUDA 12.9, VS2022)

Hello,

I am attempting to build OpenCV 4.12.0 with CUDA and cuDNN support for use with my python project including EasyOCR. I want GPU acceleration inside my python project.

I’ve encountered persistent build errors related to CUDA’s cub library, including syntax errors in inline assembly, __host__ functions attempting to call __device__ functions, and undefined members in the cuda::ptx namespace. These issues persist even after applying the --expt-relaxed-constexpr flag. I’m hoping for some guidance on how to resolve this.

My Environment:

  • Operating System: Windows 11

  • OpenCV Version: 4.12.0 (Source from GitHub)

  • OpenCV Contrib Version: 4.12.0 (Source from GitHub, matched to main OpenCV)

  • CUDA Toolkit Version: 12.9

  • cuDNN Version: 8.9.7 (for CUDA 12.x)

  • Visual Studio Version: 2022

  • Python Version: 3.13.5 (from AppData/Local/Programs/Python/Python313)

  • GPU: NVIDIA GeForce RTX 3060 Ti (Compute Capability 8.6, Ampere arch)

CMake Configuration (Relevant Variables):

I have consistently performed a clean build by deleting the build directory contents before each CMake configuration. My CMake generator is Visual Studio 17 2022 with x64 platform.

Here are the key CMake variables I have set:

  • PYTHON3_EXECUTABLE: C:/Users/.../AppData/Local/Programs/Python/Python313/python.exe

  • PYTHON3_INCLUDE_DIR: C:/Users/.../AppData/Local/Programs/Python/Python313/include

  • PYTHON3_LIBRARY: C:/Users/.../AppData/Local/Programs/Python/Python313/libs/python313.lib

  • OPENCV_EXTRA_MODULES_PATH: D:/Downloads/_cuda integration/opencv_contrib-4.12.0/modules

  • CUDA_ARCH_BIN: 8.6

  • CUDA_NVCC_FLAGS: --expt-relaxed-constexpr

  • WITH_CUDA: ON

  • BUILD_opencv_world: ON

  • OPENCV_ENABLE_NONFREE: ON

  • CMAKE_BUILD_TYPE: Release

  • BUILD_PERF_TESTS: OFF

  • BUILD_TESTS: OFF

  • WITH_OPENEXR: OFF

  • BUILD_OPENEXR: OFF

I have also tried to integrate the NVIDIA Video Codec SDK headers and libraries (NVCUVID, NVCUVENC) by setting:

  • CUDA_NVCUVID_INCLUDE_DIR: C:/Program Files/NVIDIA Video Codec SDK/Interface

  • CUDA_nvcuvid_LIBRARY: C:/Program Files/NVIDIA Video Codec SDK/Lib/x64/nvcuvid.lib

  • CUDA_NVENCAPI_INCLUDE_DIR: C:/Program Files/NVIDIA Video Codec SDK/Interface

  • CUDA_nvencodeapi_LIBRARY: C:/Program Files/NVIDIA Video Codec SDK/Lib/x64/nvEncodeAPI.lib

  • WITH_NVCUVID: ON

  • WITH_NVCUVENC: ON

The Problem:

After configuring CMake successfully and generating the Visual Studio solution, when I build the ALL_BUILD project in Visual Studio 2022 (in Release configuration), the compilation fails. The errors are now predominantly from the cub library (part of the CUDA Toolkit), specifically in cub/util_ptx.cuh, cub/block/block_adjacent_difference.cuh, cub/block/block_discontinuity.cuh, cub/warp/specializations/warp_exchange_shfl.cuh, cub/block/block_exchange.cuh, and cub/thread/thread_load.cuh.

The errors include:

  • error : expected a ")" (syntax errors in inline PTX assembly)

  • error : calling a __device__ function("...") from a __host__ function("...") is not allowed (e.g., for __syncthreads, __syncwarp, __shfl_sync)

  • error : a static "__shared__" variable declaration is not allowed inside a host function body

  • error : namespace "cuda::ptx" has no member "get_sreg_laneid"

These issues suggest a significant incompatibility or API change between OpenCV 4.12.0’s CUDA module implementations (especially in cudafilters) and CUDA Toolkit 12.9, which nvcc is unable to resolve even with --expt-relaxed-constexpr.

What I’ve Already Tried:

  1. Clean Builds: Always deleting the entire build directory before re-configuring CMake.

  2. CUDA_NVCC_FLAGS=--expt-relaxed-constexpr: Adding this flag in CMake GUI for nvcc. Verified its presence in CMakeCache.txt.

I attach a text file with VS2022 build output if it can be of help (only errors): JustPaste.it - Share Text & Images the Easy Way

Could this be a known incompatibility for OpenCV 4.12.0 with CUDA 12.9? Are there specific CMake flags, lib versions, or workarounds (beyond disabling modules) that can address these cub-related compilation errors?

Thank you in advance

Have you tried building with the latest commit from the 4.x branches. Specifically after this fix

Yes im pretty sure i downloaded most recent builds for every lib/tools involved. I started experimenting with these tools few days ago.

Now im trying as alternative to use a precompiled/wheel but after installing whl file in a python environment i still get error ImportError: DLL load failed while importing cv2: The specified module could not be found.

import cv2
import torch

print("OpenCV has CUDA support:", cv2.cuda.getCudaEnabledDeviceCount() > 0)
print("EasyOCR will use CUDA:", torch.cuda.is_available())

if torch.cuda.is_available():
    print(f"CUDA device count: {torch.cuda.device_count()}")
    for i in range(torch.cuda.device_count()):
        print(f"  Device {i}: {torch.cuda.get_device_name(i)}")

try:
    gpu_mat = cv2.cuda_GpuMat()
    print("Successfully created a GpuMat object. CUDA is working.")
except Exception as e:
    print(f"Failed to create GpuMat object. Error: {e}")

it seems i managed to make the precompiled works. i had to copy-paste cudnn files once again in the CUDA directory (initially i thought the previous cudnn version i used for cmake build was ok).

after cudnn integration, i did

  • pip uninstall opencv-contrib-python
  • pip install ‘D:\Downloads_cuda integration\opencv_contrib_python-4.12.0.88-cp37-abi3-win_amd64.whl’
  • python .\test_cuda.py

and i got

OpenCV has CUDA support: True
EasyOCR will use CUDA: True
CUDA device count: 1
  Device 0: NVIDIA GeForce RTX 3060 Ti
Successfully created a GpuMat object. CUDA is working.

now i may go forward and use this for my initial python easyocr project. though i still have the doubt on what i was missing in the cmake/vs2022 build (initial plan)

I’m glad the wheel worked but I have included instructions to avoid this error on the download page.

Nvidia GPU Computing Toolkit v12.9 is required for import cv2 to work and cuDNN 9.10.2 for accelerated inference when using the dnn module.

Note Windows OS: This wheel relies on cuDNN being installed in the CUDA Toolkit directory. Therefore you can either download

  1. the cuDNN Tarball (Version->Tarball) and extract its contents to your CUDA directory, or

  2. the installer (Version->exe (local)) and the add the path to the bin folder inside the cuDNN installation directory to your PATH_TO_PYTHON_DIST/Lib/site-packages/cv2/config.py file. e.g.

    import os

    BINARIES_PATHS = [
    os.path.join(‘D:/build/opencv/install’, ‘x64/vc17/bin’),
    os.path.join(os.getenv(‘CUDA_PATH’, ‘C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.9’), ‘bin’)
    os.path.join(‘C:/Program Files/NVIDIA/CUDNN/v9.10.2/bin/12.9’)
    ] + BINARIES_PATHS

I’m unable to re-create this issue with the latest verion of OpenCV and CUDA 12.9 Update 1.

If you have time it would be really useful if you could check to confirm that your verison of OpenCV has the PR I mentioned above by going to OPENCV_SOURCE_ROOT/cmake/OpenCVDetectCUDAUtils.cmake and checking that line 396 is the same as shown below?

I checked the file and yes im missing that change.

My current code:

macro(ocv_nvcc_flags)
  if(BUILD_SHARED_LIBS)
    set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS} -Xcompiler=-DCVAPI_EXPORTS)
  endif()

  if(UNIX OR APPLE)
    set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS} -Xcompiler=-fPIC)
  endif()
  if(APPLE)
    set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS} -Xcompiler=-fno-finite-math-only)
  endif()

  if(WIN32 AND NOT (CUDA_VERSION VERSION_LESS "11.2"))
    set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS} -Xcudafe --display_error_number --diag-suppress 1394,1388)
  endif()

  if(CMAKE_CROSSCOMPILING AND (ARM OR AARCH64))
    set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS} -Xlinker --unresolved-symbols=ignore-in-shared-libs)
  endif()

  # disabled because of multiple warnings during building nvcc auto generated files
  if(CV_GCC AND CMAKE_CXX_COMPILER_VERSION VERSION_GREATER "4.6.0")
    ocv_warnings_disable(CMAKE_CXX_FLAGS -Wunused-but-set-variable)
  endif()
endmacro()

Sorry i haven’t used github that much yet, how can i check and be sure to download latest version of OpenCV (4.12.0) with those PR included?

Initially i downloaded the sourcecode zip from here Releases · opencv/opencv · GitHub, latest release

No worries. In future to get the latest commits go to the repositories oon github, click on the Code button and download

e.g.

https://github.com/opencv/opencv/archive/refs/heads/4.x.zip

https://github.com/opencv/opencv_contrib/archive/refs/heads/4.x.zip

or use git checkout the 4.x branches making sure you’ve pulled the latest changes first.