Persistent CUDA/cuDNN Build Errors (4.12.0, CUDA 12.9, VS2022)

mercylotus · July 31, 2025, 1:27pm

Hello,

I am attempting to build OpenCV 4.12.0 with CUDA and cuDNN support for use with my python project including EasyOCR. I want GPU acceleration inside my python project.

I’ve encountered persistent build errors related to CUDA’s cub library, including syntax errors in inline assembly, __host__ functions attempting to call __device__ functions, and undefined members in the cuda::ptx namespace. These issues persist even after applying the --expt-relaxed-constexpr flag. I’m hoping for some guidance on how to resolve this.

My Environment:

Operating System: Windows 11
OpenCV Version: 4.12.0 (Source from GitHub)
OpenCV Contrib Version: 4.12.0 (Source from GitHub, matched to main OpenCV)
CUDA Toolkit Version: 12.9
cuDNN Version: 8.9.7 (for CUDA 12.x)
Visual Studio Version: 2022
Python Version: 3.13.5 (from AppData/Local/Programs/Python/Python313)
GPU: NVIDIA GeForce RTX 3060 Ti (Compute Capability 8.6, Ampere arch)

CMake Configuration (Relevant Variables):

I have consistently performed a clean build by deleting the build directory contents before each CMake configuration. My CMake generator is Visual Studio 17 2022 with x64 platform.

Here are the key CMake variables I have set:

PYTHON3_EXECUTABLE: C:/Users/.../AppData/Local/Programs/Python/Python313/python.exe
PYTHON3_INCLUDE_DIR: C:/Users/.../AppData/Local/Programs/Python/Python313/include
PYTHON3_LIBRARY: C:/Users/.../AppData/Local/Programs/Python/Python313/libs/python313.lib
OPENCV_EXTRA_MODULES_PATH: D:/Downloads/_cuda integration/opencv_contrib-4.12.0/modules
CUDA_ARCH_BIN: 8.6
CUDA_NVCC_FLAGS: --expt-relaxed-constexpr
WITH_CUDA: ON
BUILD_opencv_world: ON
OPENCV_ENABLE_NONFREE: ON
CMAKE_BUILD_TYPE: Release
BUILD_PERF_TESTS: OFF
BUILD_TESTS: OFF
WITH_OPENEXR: OFF
BUILD_OPENEXR: OFF

I have also tried to integrate the NVIDIA Video Codec SDK headers and libraries (NVCUVID, NVCUVENC) by setting:

CUDA_NVCUVID_INCLUDE_DIR: C:/Program Files/NVIDIA Video Codec SDK/Interface
CUDA_nvcuvid_LIBRARY: C:/Program Files/NVIDIA Video Codec SDK/Lib/x64/nvcuvid.lib
CUDA_NVENCAPI_INCLUDE_DIR: C:/Program Files/NVIDIA Video Codec SDK/Interface
CUDA_nvencodeapi_LIBRARY: C:/Program Files/NVIDIA Video Codec SDK/Lib/x64/nvEncodeAPI.lib
WITH_NVCUVID: ON
WITH_NVCUVENC: ON

The Problem:

After configuring CMake successfully and generating the Visual Studio solution, when I build the ALL_BUILD project in Visual Studio 2022 (in Release configuration), the compilation fails. The errors are now predominantly from the cub library (part of the CUDA Toolkit), specifically in cub/util_ptx.cuh, cub/block/block_adjacent_difference.cuh, cub/block/block_discontinuity.cuh, cub/warp/specializations/warp_exchange_shfl.cuh, cub/block/block_exchange.cuh, and cub/thread/thread_load.cuh.

The errors include:

error : expected a ")" (syntax errors in inline PTX assembly)
error : calling a __device__ function("...") from a __host__ function("...") is not allowed (e.g., for __syncthreads, __syncwarp, __shfl_sync)
error : a static "__shared__" variable declaration is not allowed inside a host function body
error : namespace "cuda::ptx" has no member "get_sreg_laneid"

These issues suggest a significant incompatibility or API change between OpenCV 4.12.0’s CUDA module implementations (especially in cudafilters) and CUDA Toolkit 12.9, which nvcc is unable to resolve even with --expt-relaxed-constexpr.

What I’ve Already Tried:

Clean Builds: Always deleting the entire build directory before re-configuring CMake.
CUDA_NVCC_FLAGS=--expt-relaxed-constexpr: Adding this flag in CMake GUI for nvcc. Verified its presence in CMakeCache.txt.

I attach a text file with VS2022 build output if it can be of help (only errors): JustPaste.it - Share Text & Images the Easy Way

Could this be a known incompatibility for OpenCV 4.12.0 with CUDA 12.9? Are there specific CMake flags, lib versions, or workarounds (beyond disabling modules) that can address these cub-related compilation errors?

Thank you in advance

cudawarped · July 31, 2025, 6:44pm

Have you tried building with the latest commit from the 4.x branches. Specifically after this fix

github.com/opencv/opencv

fix compilation problems with MSVC+Cuda 12.9

4.x ← chacha21:fix_cuda129_msvc

opened 10:29AM - 08 Jul 25 UTC

chacha21

+7 -2

fix for #27521 Actually, when ENABLE_CUDA_FIRST_CLASS_LANGUAGE is enabled, th…e fix it not necessary. However, even when ENABLE_CUDA_FIRST_CLASS_LANGUAGE is enabled, I have checked that the fix is harmless So I propose to keep it simple for now and enable the fix whatever the state of ENABLE_CUDA_FIRST_CLASS_LANGUAGE ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [X] The PR is proposed to the proper branch - [X] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake

mercylotus · July 31, 2025, 6:54pm

Yes im pretty sure i downloaded most recent builds for every lib/tools involved. I started experimenting with these tools few days ago.

Now im trying as alternative to use a precompiled/wheel but after installing whl file in a python environment i still get error ImportError: DLL load failed while importing cv2: The specified module could not be found.

import cv2
import torch

print("OpenCV has CUDA support:", cv2.cuda.getCudaEnabledDeviceCount() > 0)
print("EasyOCR will use CUDA:", torch.cuda.is_available())

if torch.cuda.is_available():
    print(f"CUDA device count: {torch.cuda.device_count()}")
    for i in range(torch.cuda.device_count()):
        print(f"  Device {i}: {torch.cuda.get_device_name(i)}")

try:
    gpu_mat = cv2.cuda_GpuMat()
    print("Successfully created a GpuMat object. CUDA is working.")
except Exception as e:
    print(f"Failed to create GpuMat object. Error: {e}")

mercylotus · July 31, 2025, 7:24pm

it seems i managed to make the precompiled works. i had to copy-paste cudnn files once again in the CUDA directory (initially i thought the previous cudnn version i used for cmake build was ok).

after cudnn integration, i did

pip uninstall opencv-contrib-python
pip install ‘D:\Downloads_cuda integration\opencv_contrib_python-4.12.0.88-cp37-abi3-win_amd64.whl’
python .\test_cuda.py

and i got

OpenCV has CUDA support: True
EasyOCR will use CUDA: True
CUDA device count: 1
  Device 0: NVIDIA GeForce RTX 3060 Ti
Successfully created a GpuMat object. CUDA is working.

now i may go forward and use this for my initial python easyocr project. though i still have the doubt on what i was missing in the cmake/vs2022 build (initial plan)

cudawarped · August 1, 2025, 4:11am

I’m glad the wheel worked but I have included instructions to avoid this error on the download page.

Nvidia GPU Computing Toolkit v12.9 is required for import cv2 to work and cuDNN 9.10.2 for accelerated inference when using the dnn module.

Note Windows OS: This wheel relies on cuDNN being installed in the CUDA Toolkit directory. Therefore you can either download

the cuDNN Tarball (Version->Tarball) and extract its contents to your CUDA directory, or

the installer (Version->exe (local)) and the add the path to the bin folder inside the cuDNN installation directory to your PATH_TO_PYTHON_DIST/Lib/site-packages/cv2/config.py file. e.g.

import os

BINARIES_PATHS = [
os.path.join(‘D:/build/opencv/install’, ‘x64/vc17/bin’),
os.path.join(os.getenv(‘CUDA_PATH’, ‘C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.9’), ‘bin’)
os.path.join(‘C:/Program Files/NVIDIA/CUDNN/v9.10.2/bin/12.9’)
] + BINARIES_PATHS

cudawarped · August 1, 2025, 7:56am

I’m unable to re-create this issue with the latest verion of OpenCV and CUDA 12.9 Update 1.

If you have time it would be really useful if you could check to confirm that your verison of OpenCV has the PR I mentioned above by going to OPENCV_SOURCE_ROOT/cmake/OpenCVDetectCUDAUtils.cmake and checking that line 396 is the same as shown below?

github.com/opencv/opencv

cmake/OpenCVDetectCUDAUtils.cmake

8dc4ad3ff


      
          endif()
          if(APPLE)
            set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS} -Xcompiler=-fno-finite-math-only)
          endif()
          
          if(WIN32)
            if (NOT (CUDA_VERSION VERSION_LESS "11.2"))
                set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS} -Xcudafe --display_error_number --diag-suppress 1394,1388)
            endif()
            if(CUDA_VERSION GREATER "12.8")
                set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS} -Xcompiler=/Zc:preprocessor)
            endif()
          endif()
          
          if(CMAKE_CROSSCOMPILING AND (ARM OR AARCH64))
            set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS} -Xlinker --unresolved-symbols=ignore-in-shared-libs)
          endif()
          
          # disabled because of multiple warnings during building nvcc auto generated files
          if(CV_GCC AND CMAKE_CXX_COMPILER_VERSION VERSION_GREATER "4.6.0")
            ocv_warnings_disable(CMAKE_CXX_FLAGS -Wunused-but-set-variable)

mercylotus · August 1, 2025, 11:04am

I checked the file and yes im missing that change.

My current code:

macro(ocv_nvcc_flags)
  if(BUILD_SHARED_LIBS)
    set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS} -Xcompiler=-DCVAPI_EXPORTS)
  endif()

  if(UNIX OR APPLE)
    set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS} -Xcompiler=-fPIC)
  endif()
  if(APPLE)
    set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS} -Xcompiler=-fno-finite-math-only)
  endif()

  if(WIN32 AND NOT (CUDA_VERSION VERSION_LESS "11.2"))
    set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS} -Xcudafe --display_error_number --diag-suppress 1394,1388)
  endif()

  if(CMAKE_CROSSCOMPILING AND (ARM OR AARCH64))
    set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS} -Xlinker --unresolved-symbols=ignore-in-shared-libs)
  endif()

  # disabled because of multiple warnings during building nvcc auto generated files
  if(CV_GCC AND CMAKE_CXX_COMPILER_VERSION VERSION_GREATER "4.6.0")
    ocv_warnings_disable(CMAKE_CXX_FLAGS -Wunused-but-set-variable)
  endif()
endmacro()

Sorry i haven’t used github that much yet, how can i check and be sure to download latest version of OpenCV (4.12.0) with those PR included?

Initially i downloaded the sourcecode zip from here Releases · opencv/opencv · GitHub, latest release

cudawarped · August 1, 2025, 11:19am

No worries. In future to get the latest commits go to the repositories oon github, click on the Code button and download

e.g.

https://github.com/opencv/opencv/archive/refs/heads/4.x.zip

https://github.com/opencv/opencv_contrib/archive/refs/heads/4.x.zip

or use git checkout the 4.x branches making sure you’ve pulled the latest changes first.

Topic		Replies	Views
Unable to Compile OpenCV with CUDA support on Ubuntu 22.04 dnn , build , cuda	2	868	July 31, 2024
Opencv + CUDA + C++ C++ cuda	2	1774	January 25, 2025
Compiling opencv c++ with cuda C++ windows , build , cuda	1	1184	March 10, 2023
Build error while trying to compile with CUDA support in VS2022 Python dnn , build , cuda	3	1224	November 4, 2023
Building Open-CV 4.9.0 with Cuda 12.3 in Windows 11 build , cuda	12	8114	March 1, 2025

Persistent CUDA/cuDNN Build Errors (4.12.0, CUDA 12.9, VS2022)

Related topics