Hello,
I’m trying to cross-compile OpenCV 4.6 with CUDA support for the Jetson Xavier NX.
My compilation environment is WSL2, where I have CUDA compilation tools, release 12.3, V12.3.52.
On my Jetson Xavier I have JetPack 5.1.3/L4T 35.5.0 with CUDAsupport (CUDA 11.4).
Apparently the compilation works correctly, without any errors.
For this I use:
cmake \
-D CUDA_VERBOSE_BUILD=ON \
-D CUDA_ARCH_BIN=7.0,7.2 \
-D CUDA_ARCH_PTX="" \
-D CUDA_NVCC_FLAGS="-D_FORCE_INLINES" \
-D CUDA_BUILD_EMULATION=OFF \
-D WITH_CUDA=ON \
-D BUILD_EXAMPLES=OFF \
-d BUILD_opencv_apps=OFF \
-D INSTALL_C_EXAMPLES=OFF \
-D INSTALL_TESTS=OFF \
-D CMAKE_BUILD_TYPE=Release \
-D CMAKE_INSTALL_PREFIX_INITIALIZED_TO_DEFAULT=true \
-D CMAKE_TOOLCHAIN_FILE=../.../jetson.toolchain.cmake
-D OPENCV_EXTRA_MODULES_PATH=../opencv_contrib/modules \D CMAKE_CROSS=../../jetson.toolchain.cmake
-D CMAKE_CROSSCOMPILING=true \
-D CUDA_TOOLKIT_TARGET_DIR=../.../Linux_for_Tegra/rootfs/usr/local/cuda-11.4/ \
-D CUDA_HOST_COMPILER=../../../../aarch64--glibc--stable-final/bin/aarch64-buildroot-linux-gnu-g++ \
../opencv
-- General configuration for OpenCV 4.6.0 =====================================
-- Version control: 4.6.0-dirty
--
-- Extra modules:
-- Location (extra): /home/afr/opencv/opencv_contrib/modules
-- Version control (extra): 4.6.0
--
-- Platform:
-- Timestamp: 2024-06-18T15:04:50Z
-- Host: Linux 5.15.74.2-microsoft-standard-WSL2+ x86_64
-- Target: Linux aarch64
-- CMake: 3.16.3
-- CMake generator: Unix Makefiles
-- CMake build tool: /usr/bin/make
-- Configuration: Release
--
-- CPU/HW features:
-- Baseline: NEON FP16
-- required: NEON
-- disabled: VFPV3
--
-- C/C++:
-- Built as dynamic libs?: YES
-- C++ standard: 11
-- C++ Compiler: /home/afr/aarch64--glibc--stable-final/bin/aarch64-buildroot-linux-gnu-g++ (ver 9.3.0)
-- C++ flags (Release): -fsigned-char -W -Wall -Wreturn-type -Wnon-virtual-dtor -Waddress -Wsequence-point -Wformat -Wformat-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Wsuggest-override -Wno-delete-non-virtual-dtor -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -fvisibility=hidden -fvisibility-inlines-hidden -O3 -DNDEBUG -DNDEBUG
-- C++ flags (Debug): -fsigned-char -W -Wall -Wreturn-type -Wnon-virtual-dtor -Waddress -Wsequence-point -Wformat -Wformat-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Wsuggest-override -Wno-delete-non-virtual-dtor -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -fvisibility=hidden -fvisibility-inlines-hidden -g -O0 -DDEBUG -D_DEBUG
-- C Compiler: /home/afr/aarch64--glibc--stable-final/bin/aarch64-buildroot-linux-gnu-gcc
-- C flags (Release): -fsigned-char -W -Wall -Wreturn-type -Waddress -Wsequence-point -Wformat -Wformat-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -fvisibility=hidden -O3 -DNDEBUG -DNDEBUG
-- C flags (Debug): -fsigned-char -W -Wall -Wreturn-type -Waddress -Wsequence-point -Wformat -Wformat-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -fvisibility=hidden -g -O0 -DDEBUG -D_DEBUG
-- Linker flags (Release): -Wl,--gc-sections -Wl,--as-needed -Wl,--no-undefined
-- Linker flags (Debug): -Wl,--gc-sections -Wl,--as-needed -Wl,--no-undefined
-- ccache: NO
-- Precompiled headers: NO
-- Extra dependencies: dl m pthread rt cudart nppc nppial nppicc nppidei nppif nppig nppim nppist nppisu nppitc npps cublas cufft -L/home/afr/Linux_for_Tegra/rootfs/usr/local/cuda-11.4/lib64
-- 3rdparty dependencies:
--
-- OpenCV modules:
-- To be built: aruco barcode bgsegm bioinspired calib3d ccalib core cudaarithm cudabgsegm cudacodec cudafeatures2d cudafilters cudaimgproc cudalegacy cudaobjdetect cudaoptflow cudastereo cudawarping cudev datasets dnn dnn_objdetect dnn_superres dpm face features2d flann fuzzy gapi hfs highgui img_hash imgcodecs imgproc intensity_transform line_descriptor mcc ml objdetect optflow phase_unwrapping photo plot quality rapid reg rgbd saliency shape stereo stitching structured_light superres surface_matching text tracking ts video videoio videostab wechat_qrcode xfeatures2d ximgproc xobjdetect xphoto
-- Disabled: world
-- Disabled by dependency: -
-- Unavailable: alphamat cvv freetype hdf java julia matlab ovis python2 python3 sfm viz
-- Applications: tests perf_tests
-- Documentation: NO
-- Non-free algorithms: NO
--
-- GUI: NONE
-- GTK+: NO
--
-- Media I/O:
-- ZLib: zlib (ver 1.2.12)
-- JPEG: libjpeg-turbo (ver 2.1.2-62)
-- WEBP: build (ver encoder: 0x020f)
-- PNG: build (ver 1.6.37)
-- TIFF: build (ver 42 - 4.2.0)
-- JPEG 2000: build (ver 2.4.0)
-- HDR: YES
-- SUNRASTER: YES
-- PXM: YES
-- PFM: YES
--
-- Video I/O:
-- DC1394: NO
-- FFMPEG: NO
-- avcodec: NO
-- avformat: NO
-- avutil: NO
-- swscale: NO
-- avresample: NO
-- GStreamer: NO
-- v4l/v4l2: YES (linux/videodev2.h)
--
-- Parallel framework: pthreads
--
-- Trace: YES (with Intel ITT)
--
-- Other third-party libraries:
-- Lapack: NO
-- Custom HAL: YES (carotene (ver 0.0.1))
-- Protobuf: build (3.19.1)
--
-- NVIDIA CUDA: YES (ver 12.3, CUFFT CUBLAS)
-- NVIDIA GPU arch: 70
-- NVIDIA PTX archs:
--
-- cuDNN: NO
--
-- OpenCL: YES (no extra features)
-- Include path: /home/afr/opencv/opencv/3rdparty/include/opencl/1.2
-- Link libraries: Dynamic load
--
-- Python (for build): /usr/bin/python3
--
-- Install to: /home/afr/opencv/build/install
-- -----------------------------------------------------------------
--
-- Configuring done
-- Generating done
-- Build files have been written to: /home/afr/opencv/build
The compilation of OpenCV using only C++ has worked correctly, because when I try a simple example using only a cv::Mat everything works correctly.
When I test an isolated example using basic C++ and CUDA libraries without the OpenCV interface it also works fine.
The problem is when I want some OpenCV option that uses CUDA, for example cv::cuda::GpuMat. In this case I automatically get the error “…device kernel image is invalid in function…”
The architecture of the Xavier NX is Volta, which supports 7.0 and 7.2 so I understand that the error is not in the architecture version.
Could someone give me some indication of the error?