Hello,
I am attempting to build OpenCV 4.12.0 with CUDA and cuDNN support for use with my python project including EasyOCR. I want GPU acceleration inside my python project.
I’ve encountered persistent build errors related to CUDA’s cub library, including syntax errors in inline assembly, __host__ functions attempting to call __device__ functions, and undefined members in the cuda::ptx namespace. These issues persist even after applying the --expt-relaxed-constexpr flag. I’m hoping for some guidance on how to resolve this.
My Environment:
-
Operating System: Windows 11
-
OpenCV Version: 4.12.0 (Source from GitHub)
-
OpenCV Contrib Version: 4.12.0 (Source from GitHub, matched to main OpenCV)
-
CUDA Toolkit Version: 12.9
-
cuDNN Version: 8.9.7 (for CUDA 12.x)
-
Visual Studio Version: 2022
-
Python Version: 3.13.5 (from
AppData/Local/Programs/Python/Python313) -
GPU: NVIDIA GeForce RTX 3060 Ti (Compute Capability 8.6, Ampere arch)
CMake Configuration (Relevant Variables):
I have consistently performed a clean build by deleting the build directory contents before each CMake configuration. My CMake generator is Visual Studio 17 2022 with x64 platform.
Here are the key CMake variables I have set:
-
PYTHON3_EXECUTABLE:C:/Users/.../AppData/Local/Programs/Python/Python313/python.exe -
PYTHON3_INCLUDE_DIR:C:/Users/.../AppData/Local/Programs/Python/Python313/include -
PYTHON3_LIBRARY:C:/Users/.../AppData/Local/Programs/Python/Python313/libs/python313.lib -
OPENCV_EXTRA_MODULES_PATH:D:/Downloads/_cuda integration/opencv_contrib-4.12.0/modules -
CUDA_ARCH_BIN:8.6 -
CUDA_NVCC_FLAGS:--expt-relaxed-constexpr -
WITH_CUDA:ON -
BUILD_opencv_world:ON -
OPENCV_ENABLE_NONFREE:ON -
CMAKE_BUILD_TYPE:Release -
BUILD_PERF_TESTS:OFF -
BUILD_TESTS:OFF -
WITH_OPENEXR:OFF -
BUILD_OPENEXR:OFF
I have also tried to integrate the NVIDIA Video Codec SDK headers and libraries (NVCUVID, NVCUVENC) by setting:
-
CUDA_NVCUVID_INCLUDE_DIR:C:/Program Files/NVIDIA Video Codec SDK/Interface -
CUDA_nvcuvid_LIBRARY:C:/Program Files/NVIDIA Video Codec SDK/Lib/x64/nvcuvid.lib -
CUDA_NVENCAPI_INCLUDE_DIR:C:/Program Files/NVIDIA Video Codec SDK/Interface -
CUDA_nvencodeapi_LIBRARY:C:/Program Files/NVIDIA Video Codec SDK/Lib/x64/nvEncodeAPI.lib -
WITH_NVCUVID:ON -
WITH_NVCUVENC:ON
The Problem:
After configuring CMake successfully and generating the Visual Studio solution, when I build the ALL_BUILD project in Visual Studio 2022 (in Release configuration), the compilation fails. The errors are now predominantly from the cub library (part of the CUDA Toolkit), specifically in cub/util_ptx.cuh, cub/block/block_adjacent_difference.cuh, cub/block/block_discontinuity.cuh, cub/warp/specializations/warp_exchange_shfl.cuh, cub/block/block_exchange.cuh, and cub/thread/thread_load.cuh.
The errors include:
-
error : expected a ")"(syntax errors in inline PTX assembly) -
error : calling a __device__ function("...") from a __host__ function("...") is not allowed(e.g., for__syncthreads,__syncwarp,__shfl_sync) -
error : a static "__shared__" variable declaration is not allowed inside a host function body -
error : namespace "cuda::ptx" has no member "get_sreg_laneid"
These issues suggest a significant incompatibility or API change between OpenCV 4.12.0’s CUDA module implementations (especially in cudafilters) and CUDA Toolkit 12.9, which nvcc is unable to resolve even with --expt-relaxed-constexpr.
What I’ve Already Tried:
-
Clean Builds: Always deleting the entire build directory before re-configuring CMake.
-
CUDA_NVCC_FLAGS=--expt-relaxed-constexpr: Adding this flag in CMake GUI fornvcc. Verified its presence inCMakeCache.txt.
I attach a text file with VS2022 build output if it can be of help (only errors): JustPaste.it - Share Text & Images the Easy Way
Could this be a known incompatibility for OpenCV 4.12.0 with CUDA 12.9? Are there specific CMake flags, lib versions, or workarounds (beyond disabling modules) that can address these cub-related compilation errors?
Thank you in advance