Hey CudaMates,
Context:
I have just started working with OpenCV gpu/cuda decoding capabilites to utilize its advantages over the cpu counterpart in my goal of making an efficient application. The problem is i am getting following Warning/Error while decoding a .mp4 video that has YUV 422 color format even though as far as i have understood. It is supported in NVIDIA Video Codec SDK for Blackwell GPU.
Problem:
Getting unsupported codec error using cudacodec:VideoDecoder on what i suppose is an H264 video mp4 container YUV422 color. Error/Warning specificaly reads YUV 4:2:2 is not currently supported, falling back to YUV 4:2:0
Error
CUDA is supported with 1 device(s).
[NOTE] Resizing Vid from (1920,1080) to (320,180) - | skip_frames = 25 |
CPU Decoding: Processed 271 | 6768 frames. Average time per frame: 1.87 ms
[ WARN:1@14.258] global video_decoder.cpp:129 cv::cudacodec::detail::VideoDecoder::create YUV 4:2:2 is not currently supported, falling back to YUV 4:2:0.
[ERROR:1@14.267] global video_parser.cpp:86 cv::cudacodec::detail::VideoParser::parseVideoData OpenCV(4.12.0-dev) E:\Haider\OpenCV_builds\OpenCV4_RTX50GPU\opencv_contrib-4.x\modules\cudacodec\src\video_decoder.cpp:194: error: (-210:Unsupported format or combination of formats) No supported output format found in function 'cv::cudacodec::detail::VideoDecoder::create'
Traceback (most recent call last):
File "e:\Haider\Advanced video analytics using Computer Vision\OpenCV-Video-Analyzer\src\tests\test_opencv_cuda_decoding.py", line 97, in <module>
gpu_reader = cv2.cudacodec.createVideoReader(video_path,params=params)
cv2.error: OpenCV(4.12.0-dev) E:\Haider\OpenCV_builds\OpenCV4_RTX50GPU\opencv_contrib-4.x\modules\cudacodec\src\video_reader.cpp:130: error: (-2:Unspecified error) Parsing/Decoding video source failed, check GPU memory is available and GPU supports requested functionality. in function '`anonymous-namespace'::VideoReaderImpl::waitForDecoderInit'
Video Metadata
- I beleive this (AVC50_1920_1080_H422P@L42) means we are dealing with H264 with YUV422 color encoding.
Video Format Video Rec Port Port: DIRECT
Video Format Video Frame Video Codec: AVC50_1920_1080_H422P@L42
Video Format Video Frame Capture Fps: 50.00p
Video Format Video Frame Format Fps: 50p
Video Format Video Layout Pixel : 1920
Video Format Video Layout Num Of Vertical Line: 1080
Video Format Video Layout Aspect Ratio: 16:9
Audio Format Num Of Channel : 2
Audio Format Audio Rec Port Port: DIRECT
Audio Format Audio Rec Port Audio Codec: LPCM16
Audio Format Audio Rec Port Track Dst: CH1
Device Manufacturer : Sony
Device Model Name : ILCE-7M4
Device Serial No : 4294967295
Recording Mode Type : normal
Recording Mode Cache Rec : false
Acquisition Record Group Name : CameraUnitMetadataSet
Acquisition Record Group Item Name: CaptureGammaEquation
Acquisition Record Group Item Value: ex-cine2
Acquisition Record Change Table Name: ImagerControlInformation
Acquisition Record Change Table Event Frame Count: 0
Acquisition Record Change Table Event Status: start
Image Size : 1920x1080
Megapixels : 2.1
Avg Bitrate : 53.5 Mbps
Rotation : 0
Setup
Hardware:
- CPU: Intel(R) Core™ i9-12900KF
- GPU: Nvidia RTX 5070TI (Blackwell)
Software:
- Windows 10
- Cuda 12.8
- Game Ready Driver 572.61
- Python 3.9.13
- OpenCV(4.12.0-dev)
OpenCV Infromation
a) CudaDeviceInfo
(va_tftorch_gpu_venv_128) E:\Video-Analyzer>python
Python 3.9.13 (tags/v3.9.13:6de2ca5, May 17 2022, 16:36:42) [MSC v.1929 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import cv2
>>> from cv2 import cuda
>>> cuda.printCudaDeviceInfo(0)
*** CUDA Device Query (Runtime API) version (CUDART static linking) ***
Device count: 1
Device 0: "NVIDIA GeForce RTX 5070 Ti"
CUDA Driver Version / Runtime Version 12.80 / 12.80
CUDA Capability Major/Minor version number: 12.0
Total amount of global memory: 16303 MBytes (17094475776 bytes)
GPU Clock Speed: 2.50 GHz
Max Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072,65536), 3D=(16384,16384,16384)
Max Layered Texture Size (dim) x layers 1D=(32768) x 2048, 2D=(32768,32768) x 2048
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 2147483647 x 65535 x 65535
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and execution: Yes with 5 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Concurrent kernel execution: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support enabled: No
Device is using TCC driver mode: No
Device supports Unified Addressing (UVA): Yes
Device PCI Bus ID / PCI location ID: 1 / 0
Compute Mode:
Default (multiple host threads can use ::cudaSetDevice() with device simultaneously)
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 12.80, CUDA Runtime Version = 12.80, NumDevs = 1
b) GetBuildInformation
>>> print(cv2.getBuildInformation())
General configuration for OpenCV 4.12.0-dev =====================================
Version control: unknown
Extra modules:
Location (extra): E:/Haider/OpenCV_builds/OpenCV4_RTX50GPU/opencv_contrib-4.x/modules
Version control (extra): unknown
Platform:
Timestamp: 2025-03-30T10:22:59Z
Host: Windows 10.0.19045 AMD64
CMake: 3.31.6
CMake generator: Visual Studio 17 2022
CMake build tool: C:/Program Files/Microsoft Visual Studio/2022/Community/MSBuild/Current/Bin/amd64/MSBuild.exe
MSVC: 1939
Configuration: Release
Algorithm Hint: ALGO_HINT_ACCURATE
CPU/HW features:
Baseline: SSE SSE2 SSE3
requested: SSE3
Dispatched code generation: SSE4_1 SSE4_2 AVX FP16 AVX2 AVX512_SKX
SSE4_1 (19 files): + SSSE3 SSE4_1
SSE4_2 (2 files): + SSSE3 SSE4_1 POPCNT SSE4_2
AVX (10 files): + SSSE3 SSE4_1 POPCNT SSE4_2 AVX
FP16 (1 files): + SSSE3 SSE4_1 POPCNT SSE4_2 AVX FP16
AVX2 (39 files): + SSSE3 SSE4_1 POPCNT SSE4_2 AVX FP16 AVX2 FMA3
AVX512_SKX (8 files): + SSSE3 SSE4_1 POPCNT SSE4_2 AVX FP16 AVX2 FMA3 AVX_512F AVX512_COMMON AVX512_SKX
C/C++:
Built as dynamic libs?: YES
C++ standard: 11
C++ Compiler: C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.39.33519/bin/Hostx64/x64/cl.exe (ver 19.39.33523.0)
C++ flags (Release): /DWIN32 /D_WINDOWS /W4 /GR /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi /fp:fast /EHa /wd4127 /wd4251 /wd4324 /wd4275 /wd4512 /wd4589 /wd4819 /MP /O2 /Ob2 /DNDEBUG
C++ flags (Debug): /DWIN32 /D_WINDOWS /W4 /GR /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi /fp:fast /EHa /wd4127 /wd4251 /wd4324 /wd4275 /wd4512 /wd4589 /wd4819 /MP /Zi /Ob0 /Od /RTC1
C Compiler: C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.39.33519/bin/Hostx64/x64/cl.exe
C flags (Release): /DWIN32 /D_WINDOWS /W3 /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi /fp:fast /MP /O2 /Ob2 /DNDEBUG
C flags (Debug): /DWIN32 /D_WINDOWS /W3 /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi /fp:fast /MP /Zi /Ob0 /Od /RTC1
Linker flags (Release): /machine:x64 /INCREMENTAL:NO
Linker flags (Debug): /machine:x64 /debug /INCREMENTAL
ccache: NO
Precompiled headers: NO
Extra dependencies: cudart_static.lib nppc.lib nppial.lib nppicc.lib nppidei.lib nppif.lib nppig.lib nppim.lib nppist.lib nppisu.lib nppitc.lib npps.lib cublas.lib cudnn.lib cufft.lib -LIBPATH:C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.8/lib/x64
3rdparty dependencies:
OpenCV modules:
To be built: aruco bgsegm bioinspired calib3d ccalib core cudaarithm cudabgsegm cudacodec cudafeatures2d cudafilters cudaimgproc cudalegacy cudaobjdetect cudaoptflow cudastereo cudawarping cudev datasets dnn dnn_objdetect dnn_superres dpm face features2d flann fuzzy gapi hfs highgui img_hash imgcodecs imgproc intensity_transform line_descriptor mcc ml objdetect optflow phase_unwrapping photo plot python3 quality rapid reg rgbd saliency shape signal stereo stitching structured_light superres surface_matching text tracking ts video videoio videostab wechat_qrcode world xfeatures2d ximgproc xobjdetect xphoto
Disabled: -
Disabled by dependency: -
Unavailable: alphamat cannops cvv fastcv freetype hdf java julia matlab ovis python2 python2 sfm viz
Applications: tests perf_tests apps
Documentation: NO
Non-free algorithms: YES
Windows RT support: NO
GUI:
Win32 UI: YES
VTK support: NO
Media I/O:
ZLib: build (ver 1.3.1)
JPEG: build-libjpeg-turbo (ver 3.1.0-70)
SIMD Support Request: YES
SIMD Support: NO
WEBP: build (ver decoder: 0x0209, encoder: 0x020f, demux: 0x0107)
AVIF: NO
PNG: build (ver 1.6.43)
SIMD Support Request: YES
SIMD Support: YES (Intel SSE)
TIFF: build (ver 42 - 4.6.0)
JPEG 2000: build (ver 2.5.3)
OpenEXR: build (ver 2.3.0)
GIF: NO
HDR: YES
SUNRASTER: YES
PXM: YES
PFM: YES
Video I/O:
FFMPEG: YES (prebuilt binaries)
avcodec: YES (58.134.100)
avformat: YES (58.76.100)
avutil: YES (56.70.100)
swscale: YES (5.9.100)
avresample: YES (4.0.0)
GStreamer: NO
DirectShow: YES
Media Foundation: YES
DXVA: YES
Parallel framework: Concurrency
Trace: YES (with Intel ITT(3.25.4))
Other third-party libraries:
Intel IPP: 2022.0.0 [2022.0.0]
at: E:/Haider/OpenCV_builds/OpenCV4_RTX50GPU/build_cudec/3rdparty/ippicv/ippicv_win/icv
Intel IPP IW: sources (2022.0.0)
at: E:/Haider/OpenCV_builds/OpenCV4_RTX50GPU/build_cudec/3rdparty/ippicv/ippicv_win/iw
Lapack: NO
Eigen: NO
Custom HAL: NO
Protobuf: build (3.19.1)
Flatbuffers: builtin/3rdparty (23.5.9)
NVIDIA CUDA: YES (ver 12.8, CUFFT CUBLAS NVCUVID NVCUVENC FAST_MATH)
NVIDIA GPU arch: 120
NVIDIA PTX archs: 120
cuDNN: YES (ver 9.8.0)
OpenCL: YES (NVD3D11)
Include path: E:/Haider/OpenCV_builds/OpenCV4_RTX50GPU/opencv-4.x/3rdparty/include/opencl/1.2
Link libraries: Dynamic load
Python 3:
Interpreter: E:/Haider/Advanced video analytics using Computer Vision/OpenCV-Video-Analyzer/va_tftorch_gpu_venv_128/Scripts/python.exe (ver 3.9.13)
Libraries: C:/Users/user/AppData/Local/Programs/Python/Python39/libs/python39.lib (ver 3.9.13)
Limited API: NO
numpy: E:/Haider/Advanced video analytics using Computer Vision/OpenCV-Video-Analyzer/va_tftorch_gpu_venv_128/lib/site-packages/numpy/core/include (ver 1.23.5)
install path: E:/Haider/Advanced video analytics using Computer Vision/OpenCV-Video-Analyzer/va_tftorch_gpu_venv_128/lib/site-packages/cv2/python-3.9
Python (for build): E:/Haider/Advanced video analytics using Computer Vision/OpenCV-Video-Analyzer/va_tftorch_gpu_venv_128/Scripts/python.exe
Java:
ant: NO
Java: NO
JNI: NO
Java wrappers: NO
Java tests: NO
Install to: E:/Haider/OpenCV_builds/OpenCV4_RTX50GPU/build_cudec/install
-----------------------------------------------------------------
>>>
My Questions:
As i look at the Nvidia Decode Support Matrix.
We see that RTX5070TI supports H264 (YUV420 and YUV422) both for 8 and 10bit. Then,
a) Why am i getting this unsupported codec error for the Blackwell card?
b) If the video i have is a differnt codec then what i presumed, which is it? and is there a workaround to use gpu decode acceleration for such a codec?
@cudawarped Can you look into this?
Full Code:
import cv2
import time
video_path = xyz.mp4
params = cv2.cudacodec.VideoReaderInitParams()
params.targetSz = (t_width,t_height)
#gpu_reader = cv2.cudacodec.createVideoReader(video_path)
gpu_reader = cv2.cudacodec.createVideoReader(video_path,params=params)
mb_free_after_creation = device_info.freeMemory()/b_to_mb
# Get number of decode surfaces currently a frame needs to be processed before the format info is valid
gpu_reader.grab()
format_info = gpu_reader.format()
mb_used = mb_free_start - mb_free_after_creation
print(f'Total Memory: {device_info.totalMemory()/b_to_mb:.2f}MB')
print(f'Free Memory after context creation: {mb_free_start:.2f}MB')
print(f'Free Memory after creating video reader: {mb_free_after_creation:.2f}MB')
print(f'{mb_used:.2f}MB of internal memory when using {format_info.ulNumDecodeSurfaces} ({format_info.ulWidth}x{format_info.ulHeight}) decode surfaces')
params = cv2.cudacodec.VideoReaderInitParams()
params.targetSz = (t_width,t_height)
params.minNumDecodeSurfaces = format_info.ulNumDecodeSurfaces * 2
gpu_reader_v2 = cv2.cudacodec.createVideoReader(video_path,params=params)
# If requrid in a differnt color format then BGRA - Do color coversion using set()
#gpu_reader_v2.set(colorFormat=cv2.cudacodec.GRAY)
mb_used_double_sufaces = mb_free_start - device_info.freeMemory()/b_to_mb
gpu_reader_v2.grab()
format_info = gpu_reader_v2.format()
assert format_info.ulNumDecodeSurfaces == params.minNumDecodeSurfaces
print(f'Memory increase from doubling the number of decode surfaces: {100*(mb_used_double_sufaces - mb_used)/mb_used:.2f}%')
# [PREALLOCATION] --- CUDA decoding test using cv2.cudacodec (if available) ---
if hasattr(cv2, 'cudacodec'):
try:
gpu_frame_times = []
gpu_frame_count = 0
gpu_proc_frame_count = 0
frame_gpu = cv2.cuda.GpuMat(t_height,t_width,cv2.CV_8UC4)
while True:
start_time = time.time()
ret, _ = gpu_reader_v2.nextFrame(frame_gpu) # Unpack the tuple
if not ret or frame_gpu is None:
break
if gpu_frame_count % skip_frames == 0:
# [ADDED] Retrieve the actual frame we know we need it
gpu_proc_frame_count += 1
end_time = time.time()
gpu_frame_times.append(end_time - start_time)
gpu_frame_count += 1
if gpu_frame_count > 0:
gpu_avg_time = sum(gpu_frame_times) / gpu_frame_count
print(f"CUDA Decoding: Processed {gpu_proc_frame_count} | {gpu_frame_count} frames. Average time per frame: {gpu_avg_time*1000:.2f} ms")
else:
print("No frames decoded using CUDA decoding.")
except Exception as e:
print("Error during CUDA decoding:", e)
else:
print("CUDA decoding is not available in this OpenCV build (cv2.cudacodec missing).")
```python