Hello,
I am attempting to build OpenCV 4.12.0 with CUDA and cuDNN support for use with my python project including EasyOCR. I want GPU acceleration inside my python project.
I’ve encountered persistent build errors related to CUDA’s cub
library, including syntax errors in inline assembly, __host__
functions attempting to call __device__
functions, and undefined members in the cuda::ptx
namespace. These issues persist even after applying the --expt-relaxed-constexpr
flag. I’m hoping for some guidance on how to resolve this.
My Environment:
-
Operating System: Windows 11
-
OpenCV Version: 4.12.0 (Source from GitHub)
-
OpenCV Contrib Version: 4.12.0 (Source from GitHub, matched to main OpenCV)
-
CUDA Toolkit Version: 12.9
-
cuDNN Version: 8.9.7 (for CUDA 12.x)
-
Visual Studio Version: 2022
-
Python Version: 3.13.5 (from
AppData/Local/Programs/Python/Python313
) -
GPU: NVIDIA GeForce RTX 3060 Ti (Compute Capability 8.6, Ampere arch)
CMake Configuration (Relevant Variables):
I have consistently performed a clean build by deleting the build directory contents before each CMake configuration. My CMake generator is Visual Studio 17 2022
with x64
platform.
Here are the key CMake variables I have set:
-
PYTHON3_EXECUTABLE
:C:/Users/.../AppData/Local/Programs/Python/Python313/python.exe
-
PYTHON3_INCLUDE_DIR
:C:/Users/.../AppData/Local/Programs/Python/Python313/include
-
PYTHON3_LIBRARY
:C:/Users/.../AppData/Local/Programs/Python/Python313/libs/python313.lib
-
OPENCV_EXTRA_MODULES_PATH
:D:/Downloads/_cuda integration/opencv_contrib-4.12.0/modules
-
CUDA_ARCH_BIN
:8.6
-
CUDA_NVCC_FLAGS
:--expt-relaxed-constexpr
-
WITH_CUDA
:ON
-
BUILD_opencv_world
:ON
-
OPENCV_ENABLE_NONFREE
:ON
-
CMAKE_BUILD_TYPE
:Release
-
BUILD_PERF_TESTS
:OFF
-
BUILD_TESTS
:OFF
-
WITH_OPENEXR
:OFF
-
BUILD_OPENEXR
:OFF
I have also tried to integrate the NVIDIA Video Codec SDK headers and libraries (NVCUVID
, NVCUVENC
) by setting:
-
CUDA_NVCUVID_INCLUDE_DIR
:C:/Program Files/NVIDIA Video Codec SDK/Interface
-
CUDA_nvcuvid_LIBRARY
:C:/Program Files/NVIDIA Video Codec SDK/Lib/x64/nvcuvid.lib
-
CUDA_NVENCAPI_INCLUDE_DIR
:C:/Program Files/NVIDIA Video Codec SDK/Interface
-
CUDA_nvencodeapi_LIBRARY
:C:/Program Files/NVIDIA Video Codec SDK/Lib/x64/nvEncodeAPI.lib
-
WITH_NVCUVID
:ON
-
WITH_NVCUVENC
:ON
The Problem:
After configuring CMake successfully and generating the Visual Studio solution, when I build the ALL_BUILD
project in Visual Studio 2022 (in Release
configuration), the compilation fails. The errors are now predominantly from the cub
library (part of the CUDA Toolkit), specifically in cub/util_ptx.cuh
, cub/block/block_adjacent_difference.cuh
, cub/block/block_discontinuity.cuh
, cub/warp/specializations/warp_exchange_shfl.cuh
, cub/block/block_exchange.cuh
, and cub/thread/thread_load.cuh
.
The errors include:
-
error : expected a ")"
(syntax errors in inline PTX assembly) -
error : calling a __device__ function("...") from a __host__ function("...") is not allowed
(e.g., for__syncthreads
,__syncwarp
,__shfl_sync
) -
error : a static "__shared__" variable declaration is not allowed inside a host function body
-
error : namespace "cuda::ptx" has no member "get_sreg_laneid"
These issues suggest a significant incompatibility or API change between OpenCV 4.12.0’s CUDA module implementations (especially in cudafilters
) and CUDA Toolkit 12.9, which nvcc
is unable to resolve even with --expt-relaxed-constexpr
.
What I’ve Already Tried:
-
Clean Builds: Always deleting the entire build directory before re-configuring CMake.
-
CUDA_NVCC_FLAGS=--expt-relaxed-constexpr
: Adding this flag in CMake GUI fornvcc
. Verified its presence inCMakeCache.txt
.
I attach a text file with VS2022 build output if it can be of help (only errors): JustPaste.it - Share Text & Images the Easy Way
Could this be a known incompatibility for OpenCV 4.12.0 with CUDA 12.9? Are there specific CMake flags, lib versions, or workarounds (beyond disabling modules) that can address these cub
-related compilation errors?
Thank you in advance