OpenCV cudacodec nvdec says unsupported format on blackwell Gpu

Hey CudaMates,

Context:

I have just started working with OpenCV gpu/cuda decoding capabilites to utilize its advantages over the cpu counterpart in my goal of making an efficient application. The problem is i am getting following Warning/Error while decoding a .mp4 video that has YUV 422 color format even though as far as i have understood. It is supported in NVIDIA Video Codec SDK for Blackwell GPU.

Problem:

Getting unsupported codec error using cudacodec:VideoDecoder on what i suppose is an H264 video mp4 container YUV422 color. Error/Warning specificaly reads YUV 4:2:2 is not currently supported, falling back to YUV 4:2:0

Error

CUDA is supported with 1 device(s).

[NOTE] Resizing Vid from (1920,1080) to (320,180) - | skip_frames = 25 |

CPU Decoding: Processed 271 | 6768 frames. Average time per frame: 1.87 ms
[ WARN:1@14.258] global video_decoder.cpp:129 cv::cudacodec::detail::VideoDecoder::create YUV 4:2:2 is not currently supported, falling back to YUV 4:2:0.
[ERROR:1@14.267] global video_parser.cpp:86 cv::cudacodec::detail::VideoParser::parseVideoData OpenCV(4.12.0-dev) E:\Haider\OpenCV_builds\OpenCV4_RTX50GPU\opencv_contrib-4.x\modules\cudacodec\src\video_decoder.cpp:194: error: (-210:Unsupported format or combination of formats) No supported output format found in function 'cv::cudacodec::detail::VideoDecoder::create'

Traceback (most recent call last):
  File "e:\Haider\Advanced video analytics using Computer Vision\OpenCV-Video-Analyzer\src\tests\test_opencv_cuda_decoding.py", line 97, in <module>
    gpu_reader = cv2.cudacodec.createVideoReader(video_path,params=params)
cv2.error: OpenCV(4.12.0-dev) E:\Haider\OpenCV_builds\OpenCV4_RTX50GPU\opencv_contrib-4.x\modules\cudacodec\src\video_reader.cpp:130: error: (-2:Unspecified error) Parsing/Decoding video source failed, check GPU memory is available and GPU supports requested functionality. in function '`anonymous-namespace'::VideoReaderImpl::waitForDecoderInit'

Video Metadata

  • I beleive this (AVC50_1920_1080_H422P@L42) means we are dealing with H264 with YUV422 color encoding.
Video Format Video Rec Port Port: DIRECT
Video Format Video Frame Video Codec: AVC50_1920_1080_H422P@L42
Video Format Video Frame Capture Fps: 50.00p
Video Format Video Frame Format Fps: 50p
Video Format Video Layout Pixel : 1920
Video Format Video Layout Num Of Vertical Line: 1080
Video Format Video Layout Aspect Ratio: 16:9
Audio Format Num Of Channel     : 2
Audio Format Audio Rec Port Port: DIRECT
Audio Format Audio Rec Port Audio Codec: LPCM16
Audio Format Audio Rec Port Track Dst: CH1
Device Manufacturer             : Sony
Device Model Name               : ILCE-7M4
Device Serial No                : 4294967295
Recording Mode Type             : normal
Recording Mode Cache Rec        : false
Acquisition Record Group Name   : CameraUnitMetadataSet
Acquisition Record Group Item Name: CaptureGammaEquation
Acquisition Record Group Item Value: ex-cine2
Acquisition Record Change Table Name: ImagerControlInformation
Acquisition Record Change Table Event Frame Count: 0
Acquisition Record Change Table Event Status: start
Image Size                      : 1920x1080
Megapixels                      : 2.1
Avg Bitrate                     : 53.5 Mbps
Rotation                        : 0

Setup

Hardware:

  • CPU: Intel(R) Core™ i9-12900KF
  • GPU: Nvidia RTX 5070TI (Blackwell)

Software:

  • Windows 10
  • Cuda 12.8
  • Game Ready Driver 572.61
  • Python 3.9.13
  • OpenCV(4.12.0-dev)

OpenCV Infromation

a) CudaDeviceInfo

(va_tftorch_gpu_venv_128) E:\Video-Analyzer>python
Python 3.9.13 (tags/v3.9.13:6de2ca5, May 17 2022, 16:36:42) [MSC v.1929 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import cv2
>>> from cv2 import cuda
>>> cuda.printCudaDeviceInfo(0)
*** CUDA Device Query (Runtime API) version (CUDART static linking) *** 

Device count: 1

Device 0: "NVIDIA GeForce RTX 5070 Ti"
  CUDA Driver Version / Runtime Version          12.80 / 12.80
  CUDA Capability Major/Minor version number:    12.0
  Total amount of global memory:                 16303 MBytes (17094475776 bytes)
  GPU Clock Speed:                               2.50 GHz
  Max Texture Dimension Size (x,y,z)             1D=(131072), 2D=(131072,65536), 3D=(16384,16384,16384)
  Max Layered Texture Size (dim) x layers        1D=(32768) x 2048, 2D=(32768,32768) x 2048
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per block:           1024
  Maximum sizes of each dimension of a block:    1024 x 1024 x 64
  Maximum sizes of each dimension of a grid:     2147483647 x 65535 x 65535
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and execution:                 Yes with 5 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Concurrent kernel execution:                   Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support enabled:                No
  Device is using TCC driver mode:               No
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Bus ID / PCI location ID:           1 / 0
  Compute Mode:
      Default (multiple host threads can use ::cudaSetDevice() with device simultaneously)

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version  = 12.80, CUDA Runtime Version = 12.80, NumDevs = 1

b) GetBuildInformation


>>> print(cv2.getBuildInformation()) 

General configuration for OpenCV 4.12.0-dev =====================================
  Version control:               unknown

  Extra modules:
    Location (extra):            E:/Haider/OpenCV_builds/OpenCV4_RTX50GPU/opencv_contrib-4.x/modules
    Version control (extra):     unknown

  Platform:
    Timestamp:                   2025-03-30T10:22:59Z
    Host:                        Windows 10.0.19045 AMD64
    CMake:                       3.31.6
    CMake generator:             Visual Studio 17 2022
    CMake build tool:            C:/Program Files/Microsoft Visual Studio/2022/Community/MSBuild/Current/Bin/amd64/MSBuild.exe
    MSVC:                        1939
    Configuration:               Release
    Algorithm Hint:              ALGO_HINT_ACCURATE

  CPU/HW features:
    Baseline:                    SSE SSE2 SSE3
      requested:                 SSE3
    Dispatched code generation:  SSE4_1 SSE4_2 AVX FP16 AVX2 AVX512_SKX
      SSE4_1 (19 files):         + SSSE3 SSE4_1
      SSE4_2 (2 files):          + SSSE3 SSE4_1 POPCNT SSE4_2
      AVX (10 files):            + SSSE3 SSE4_1 POPCNT SSE4_2 AVX
      FP16 (1 files):            + SSSE3 SSE4_1 POPCNT SSE4_2 AVX FP16
      AVX2 (39 files):           + SSSE3 SSE4_1 POPCNT SSE4_2 AVX FP16 AVX2 FMA3
      AVX512_SKX (8 files):      + SSSE3 SSE4_1 POPCNT SSE4_2 AVX FP16 AVX2 FMA3 AVX_512F AVX512_COMMON AVX512_SKX

  C/C++:
    Built as dynamic libs?:      YES
    C++ standard:                11
    C++ Compiler:                C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.39.33519/bin/Hostx64/x64/cl.exe  (ver 19.39.33523.0)     
    C++ flags (Release):         /DWIN32 /D_WINDOWS /W4 /GR  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi  /fp:fast    /EHa /wd4127 /wd4251 /wd4324 /wd4275 /wd4512 /wd4589 /wd4819 /MP  /O2 /Ob2 /DNDEBUG
    C++ flags (Debug):           /DWIN32 /D_WINDOWS /W4 /GR  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi  /fp:fast    /EHa /wd4127 /wd4251 /wd4324 /wd4275 /wd4512 /wd4589 /wd4819 /MP  /Zi /Ob0 /Od /RTC1
    C Compiler:                  C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.39.33519/bin/Hostx64/x64/cl.exe
    C flags (Release):           /DWIN32 /D_WINDOWS /W3  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi  /fp:fast    /MP   /O2 /Ob2 /DNDEBUG
    C flags (Debug):             /DWIN32 /D_WINDOWS /W3  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi  /fp:fast    /MP /Zi /Ob0 /Od /RTC1
    Linker flags (Release):      /machine:x64  /INCREMENTAL:NO
    Linker flags (Debug):        /machine:x64  /debug /INCREMENTAL
    ccache:                      NO
    Precompiled headers:         NO
    Extra dependencies:          cudart_static.lib nppc.lib nppial.lib nppicc.lib nppidei.lib nppif.lib nppig.lib nppim.lib nppist.lib nppisu.lib nppitc.lib npps.lib cublas.lib cudnn.lib cufft.lib -LIBPATH:C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.8/lib/x64
    3rdparty dependencies:

  OpenCV modules:
    To be built:                 aruco bgsegm bioinspired calib3d ccalib core cudaarithm cudabgsegm cudacodec cudafeatures2d cudafilters cudaimgproc cudalegacy cudaobjdetect cudaoptflow cudastereo cudawarping cudev datasets dnn dnn_objdetect dnn_superres dpm face features2d flann fuzzy gapi hfs highgui img_hash imgcodecs imgproc intensity_transform line_descriptor mcc ml objdetect optflow phase_unwrapping photo plot python3 quality rapid reg rgbd saliency shape signal stereo stitching structured_light superres surface_matching text tracking ts video videoio videostab wechat_qrcode world xfeatures2d ximgproc xobjdetect xphoto
    Disabled:                    -
    Disabled by dependency:      -
    Unavailable:                 alphamat cannops cvv fastcv freetype hdf java julia matlab ovis python2 python2 sfm viz
    Applications:                tests perf_tests apps
    Documentation:               NO
    Non-free algorithms:         YES

  Windows RT support:            NO

  GUI:
    Win32 UI:                    YES
    VTK support:                 NO

  Media I/O:
    ZLib:                        build (ver 1.3.1)
    JPEG:                        build-libjpeg-turbo (ver 3.1.0-70)
      SIMD Support Request:      YES
      SIMD Support:              NO
    WEBP:                        build (ver decoder: 0x0209, encoder: 0x020f, demux: 0x0107)
    AVIF:                        NO
    PNG:                         build (ver 1.6.43)
      SIMD Support Request:      YES
      SIMD Support:              YES (Intel SSE)
    TIFF:                        build (ver 42 - 4.6.0)
    JPEG 2000:                   build (ver 2.5.3)
    OpenEXR:                     build (ver 2.3.0)
    GIF:                         NO
    HDR:                         YES
    SUNRASTER:                   YES
    PXM:                         YES
    PFM:                         YES

  Video I/O:
    FFMPEG:                      YES (prebuilt binaries)
      avcodec:                   YES (58.134.100)
      avformat:                  YES (58.76.100)
      avutil:                    YES (56.70.100)
      swscale:                   YES (5.9.100)
      avresample:                YES (4.0.0)
    GStreamer:                   NO
    DirectShow:                  YES
    Media Foundation:            YES
      DXVA:                      YES

  Parallel framework:            Concurrency

  Trace:                         YES (with Intel ITT(3.25.4))

  Other third-party libraries:
    Intel IPP:                   2022.0.0 [2022.0.0]
           at:                   E:/Haider/OpenCV_builds/OpenCV4_RTX50GPU/build_cudec/3rdparty/ippicv/ippicv_win/icv
    Intel IPP IW:                sources (2022.0.0)
              at:                E:/Haider/OpenCV_builds/OpenCV4_RTX50GPU/build_cudec/3rdparty/ippicv/ippicv_win/iw
    Lapack:                      NO
    Eigen:                       NO
    Custom HAL:                  NO
    Protobuf:                    build (3.19.1)
    Flatbuffers:                 builtin/3rdparty (23.5.9)

  NVIDIA CUDA:                   YES (ver 12.8, CUFFT CUBLAS NVCUVID NVCUVENC FAST_MATH)
    NVIDIA GPU arch:             120
    NVIDIA PTX archs:            120

  cuDNN:                         YES (ver 9.8.0)

  OpenCL:                        YES (NVD3D11)
    Include path:                E:/Haider/OpenCV_builds/OpenCV4_RTX50GPU/opencv-4.x/3rdparty/include/opencl/1.2
    Link libraries:              Dynamic load

  Python 3:
    Interpreter:                 E:/Haider/Advanced video analytics using Computer Vision/OpenCV-Video-Analyzer/va_tftorch_gpu_venv_128/Scripts/python.exe (ver 3.9.13)
    Libraries:                   C:/Users/user/AppData/Local/Programs/Python/Python39/libs/python39.lib (ver 3.9.13)
    Limited API:                 NO
    numpy:                       E:/Haider/Advanced video analytics using Computer Vision/OpenCV-Video-Analyzer/va_tftorch_gpu_venv_128/lib/site-packages/numpy/core/include (ver 1.23.5)
    install path:                E:/Haider/Advanced video analytics using Computer Vision/OpenCV-Video-Analyzer/va_tftorch_gpu_venv_128/lib/site-packages/cv2/python-3.9

  Python (for build):            E:/Haider/Advanced video analytics using Computer Vision/OpenCV-Video-Analyzer/va_tftorch_gpu_venv_128/Scripts/python.exe

  Java:
    ant:                         NO
    Java:                        NO
    JNI:                         NO
    Java wrappers:               NO
    Java tests:                  NO

  Install to:                    E:/Haider/OpenCV_builds/OpenCV4_RTX50GPU/build_cudec/install
-----------------------------------------------------------------


>>>

My Questions:

As i look at the Nvidia Decode Support Matrix.

We see that RTX5070TI supports H264 (YUV420 and YUV422) both for 8 and 10bit. Then,

a) Why am i getting this unsupported codec error for the Blackwell card?

b) If the video i have is a differnt codec then what i presumed, which is it? and is there a workaround to use gpu decode acceleration for such a codec?

@cudawarped Can you look into this?

Full Code:

import cv2
import time

video_path = xyz.mp4

params = cv2.cudacodec.VideoReaderInitParams()
params.targetSz = (t_width,t_height)

#gpu_reader = cv2.cudacodec.createVideoReader(video_path)
gpu_reader = cv2.cudacodec.createVideoReader(video_path,params=params)
mb_free_after_creation = device_info.freeMemory()/b_to_mb

# Get number of decode surfaces currently a frame needs to be processed before the format info is valid
gpu_reader.grab()
format_info = gpu_reader.format()
mb_used = mb_free_start - mb_free_after_creation
print(f'Total Memory:                            {device_info.totalMemory()/b_to_mb:.2f}MB')
print(f'Free Memory after context creation:      {mb_free_start:.2f}MB')
print(f'Free Memory after creating video reader: {mb_free_after_creation:.2f}MB')
print(f'{mb_used:.2f}MB of internal memory when using {format_info.ulNumDecodeSurfaces} ({format_info.ulWidth}x{format_info.ulHeight}) decode surfaces')

params = cv2.cudacodec.VideoReaderInitParams()
params.targetSz = (t_width,t_height)
params.minNumDecodeSurfaces = format_info.ulNumDecodeSurfaces * 2

gpu_reader_v2 = cv2.cudacodec.createVideoReader(video_path,params=params)
# If requrid in a differnt color format then BGRA - Do color coversion using set()
#gpu_reader_v2.set(colorFormat=cv2.cudacodec.GRAY)

mb_used_double_sufaces = mb_free_start - device_info.freeMemory()/b_to_mb
gpu_reader_v2.grab()
format_info = gpu_reader_v2.format()
assert format_info.ulNumDecodeSurfaces == params.minNumDecodeSurfaces 

print(f'Memory increase from doubling the number of decode surfaces: {100*(mb_used_double_sufaces - mb_used)/mb_used:.2f}%')


# [PREALLOCATION] --- CUDA decoding test using cv2.cudacodec (if available) ---
if hasattr(cv2, 'cudacodec'):
    try:
        gpu_frame_times = []
        gpu_frame_count = 0
        gpu_proc_frame_count = 0

        frame_gpu = cv2.cuda.GpuMat(t_height,t_width,cv2.CV_8UC4)
        while True:
            start_time = time.time()
            ret, _ = gpu_reader_v2.nextFrame(frame_gpu)  # Unpack the tuple
            if not ret or frame_gpu is None:
                break

            if gpu_frame_count % skip_frames == 0:
                # [ADDED] Retrieve the actual frame we know we need it
                gpu_proc_frame_count += 1

            end_time = time.time()
            gpu_frame_times.append(end_time - start_time)
            gpu_frame_count += 1

        if gpu_frame_count > 0:
            gpu_avg_time = sum(gpu_frame_times) / gpu_frame_count
            print(f"CUDA Decoding: Processed {gpu_proc_frame_count} | {gpu_frame_count} frames. Average time per frame: {gpu_avg_time*1000:.2f} ms")
        else:
            print("No frames decoded using CUDA decoding.")
    except Exception as e:
        print("Error during CUDA decoding:", e)
else:
    print("CUDA decoding is not available in this OpenCV build (cv2.cudacodec missing).")
```python

@haider_abbasi YUV422 is not supported by cudacodec. It was introduced in Blackwell but has not been added to cudacodec yet.

Thanks, I suspected this might be the case.

  • Is there any plan to add it in the near future?

Seeing blackwell cards have these wide codecs support. It would be quite useful if opencv does add this support.

As I am aware there are no plans to add this functionality in the near future. cudacodec is part of the contrib repository meaning that generaly enhancements are not planned by the core development team. I will add av1 and yuv422 support if/when I get access to a local Blackwell card.

1 Like

Thankyou :slight_smile: