Can only init 5 cv2.cudacodec.createVideoWriter successfully in parallel

hello all, I’m new to OpenCV and first apologize about my English, I’m not a native speaker, but I’ll describe it as clearly as I can.

I built OpenCV with Cuda to accelerate video encoding and decoding, here’s my building information

General configuration for OpenCV 4.7.0 =====================================
  Version control:               unknown

  Extra modules:
    Location (extra):            D:/opencv_build/opencv_contrib-4.7.0/modules
    Version control (extra):     unknown

  Platform:
    Timestamp:                   2023-11-09T02:50:34Z
    Host:                        Windows 10.0.17763 AMD64
    CMake:                       3.26.4
    CMake generator:             Visual Studio 16 2019
    CMake build tool:            C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/MSBuild/Current/Bin/MSBuild.exe
    MSVC:                        1929
    Configuration:               Release

  CPU/HW features:
    Baseline:                    SSE SSE2 SSE3
      requested:                 SSE3
    Dispatched code generation:  SSE4_1 SSE4_2 FP16 AVX AVX2 AVX512_SKX
      requested:                 SSE4_1 SSE4_2 AVX FP16 AVX2 AVX512_SKX
      SSE4_1 (18 files):         + SSSE3 SSE4_1
      SSE4_2 (2 files):          + SSSE3 SSE4_1 POPCNT SSE4_2
      FP16 (1 files):            + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 AVX
      AVX (5 files):             + SSSE3 SSE4_1 POPCNT SSE4_2 AVX
      AVX2 (34 files):           + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2
      AVX512_SKX (8 files):      + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2 AVX_512F AVX512_COMMON AVX512_SKX

  C/C++:
    Built as dynamic libs?:      YES
    C++ standard:                11
    C++ Compiler:                C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64/cl.exe  (ver 19.29.30152.0)
    C++ flags (Release):         /DWIN32 /D_WINDOWS /W4 /GR  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi  /fp:fast     /EHa /wd4127 /wd4251 /wd4324 /wd4275 /wd4512 /wd4589 /wd4819 /MP  /MD /O2 /Ob2 /DNDEBUG 
    C++ flags (Debug):           /DWIN32 /D_WINDOWS /W4 /GR  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi  /fp:fast     /EHa /wd4127 /wd4251 /wd4324 /wd4275 /wd4512 /wd4589 /wd4819 /MP  /MDd /Zi /Ob0 /Od /RTC1 
    C Compiler:                  C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64/cl.exe
    C flags (Release):           /DWIN32 /D_WINDOWS /W3  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi  /fp:fast     /MP   /MD /O2 /Ob2 /DNDEBUG 
    C flags (Debug):             /DWIN32 /D_WINDOWS /W3  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi  /fp:fast     /MP /MDd /Zi /Ob0 /Od /RTC1 
    Linker flags (Release):      /machine:x64  /INCREMENTAL:NO 
    Linker flags (Debug):        /machine:x64  /debug /INCREMENTAL 
    ccache:                      NO
    Precompiled headers:         NO
    Extra dependencies:          cudart_static.lib nppc.lib nppial.lib nppicc.lib nppidei.lib nppif.lib nppig.lib nppim.lib nppist.lib nppisu.lib nppitc.lib npps.lib cublas.lib cudnn.lib cufft.lib -LIBPATH:C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2/lib/x64
    3rdparty dependencies:

  OpenCV modules:
    To be built:                 alphamat aruco barcode bgsegm bioinspired calib3d ccalib core cudaarithm cudabgsegm cudacodec cudafeatures2d cudafilters cudaimgproc cudalegacy cudaobjdetect cudaoptflow cudastereo cudawarping cudev datasets dnn dnn_objdetect dnn_superres dpm face features2d flann fuzzy gapi hfs highgui img_hash imgcodecs imgproc intensity_transform line_descriptor mcc ml objdetect optflow phase_unwrapping photo plot python3 quality rapid reg rgbd saliency shape stereo stitching structured_light superres surface_matching text tracking ts video videoio videostab wechat_qrcode world xfeatures2d ximgproc xobjdetect xphoto
    Disabled:                    -
    Disabled by dependency:      -
    Unavailable:                 cvv freetype hdf java julia matlab ovis python2 python2 sfm viz
    Applications:                tests perf_tests apps
    Documentation:               NO
    Non-free algorithms:         NO

  Windows RT support:            NO

  GUI: 
    Win32 UI:                    YES
    VTK support:                 NO

  Media I/O: 
    ZLib:                        build (ver 1.2.13)
    JPEG:                        build-libjpeg-turbo (ver 2.1.3-62)
      SIMD Support Request:      YES
      SIMD Support:              NO
    WEBP:                        build (ver encoder: 0x020f)
    PNG:                         build (ver 1.6.37)
    TIFF:                        build (ver 42 - 4.2.0)
    JPEG 2000:                   build (ver 2.4.0)
    OpenEXR:                     build (ver 2.3.0)
    HDR:                         YES
    SUNRASTER:                   YES
    PXM:                         YES
    PFM:                         YES

  Video I/O:
    DC1394:                      NO
    FFMPEG:                      YES (prebuilt binaries)
      avcodec:                   YES (58.134.100)
      avformat:                  YES (58.76.100)
      avutil:                    YES (56.70.100)
      swscale:                   YES (5.9.100)
      avresample:                YES (4.0.0)
    GStreamer:                   NO
    DirectShow:                  YES
    Media Foundation:            YES
      DXVA:                      YES

  Parallel framework:            Concurrency

  Trace:                         YES (with Intel ITT)

  Other third-party libraries:
    Intel IPP:                   2020.0.0 Gold [2020.0.0]
           at:                   D:/opencv_build/build/3rdparty/ippicv/ippicv_win/icv
    Intel IPP IW:                sources (2020.0.0)
              at:                D:/opencv_build/build/3rdparty/ippicv/ippicv_win/iw
    Lapack:                      YES (C:/openblas/lib/openblas.lib)
    Eigen:                       YES (ver 3.4.0)
    Custom HAL:                  NO
    Protobuf:                    build (3.19.1)

  NVIDIA CUDA:                   YES (ver 11.2, CUFFT CUBLAS NVCUVID NVCUVENC FAST_MATH)
    NVIDIA GPU arch:             86
    NVIDIA PTX archs:

  cuDNN:                         YES (ver 8.6.0)

  OpenCL:                        YES (NVD3D11)
    Include path:                D:/opencv_build/opencv-4.7.0/3rdparty/include/opencl/1.2
    Link libraries:              Dynamic load

  Python 3:
    Interpreter:                 D:/Anaconda3/envs/acc/python.exe (ver 3.8.8)
    Libraries:                   D:/Anaconda3/libs/python38.lib (ver 3.8.8)
    numpy:                       D:/Anaconda3/envs/acc/Lib/site-packages/numpy/core/include (ver 1.24.4)
    install path:                D:/Anaconda3/envs/acc/Lib/site-packages/cv2/python-3.8

  Python (for build):            D:/Anaconda3/envs/acc/python.exe

  Java:                          
    ant:                         NO
    JNI:                         D:/JDK8/jdk1.8.0_101/include D:/JDK8/jdk1.8.0_101/include/win32 D:/JDK8/jdk1.8.0_101/include
    Java wrappers:               NO
    Java tests:                  NO

  Install to:                    D:/opencv_build/install
-----------------------------------------------------------------

and I’m doing video reading and writing with multiple threads and gpus. here’s a short version of my code.

def gpu_process(i, gpu_num):
    cv2.cuda.setDevice(gpu_num)

    video_gpu = cv2.cudacodec.createVideoReader(r'video.mp4')
    video_gpu.set(cv2.cudacodec.COLOR_FORMAT_BGR)
    format_gpu = video_gpu.format()
    fps = int(format_gpu.fps)
    height = format_gpu.height
    width = format_gpu.width

    encoder_params_in = cv2.cudacodec.EncoderParams()
    stream = cv2.cuda.Stream()

    result_video_path = '{}.mp4'.format(i)

    out = cv2.cudacodec.createVideoWriter(result_video_path, (width, height), cv2.cudacodec.H264, fps=fps,
                                          colorFormat=cv2.cudacodec.COLOR_FORMAT_BGR,
                                          params=encoder_params_in,
                                          stream=stream)

    while True:
        ret, frame = video_gpu.nextFrame()
       
        if not ret:
            break

        out.write(frame)

    out.release()


for i in range(7):
    if i % 2 == 0:
        gpu_num = 0
    else:
        gpu_num = 1
    a = threading.Thread(target=gpu_process, args=(i, gpu_num)).start()

but after 5 threads running successfully, it will encounter this error:

out = cv2.cudacodec.createVideoWriter(result_video_path, (width, height), cv2.cudacodec.H264, fps=fps,
cv2.error: OpenCV(4.7.0) D:\opencv_build\opencv_contrib-4.7.0\modules\cudacodec\src\video_writer.cpp:220: error: (-217:Gpu API call) in function 'cv::cudacodec::VideoWriterImpl::Init'
> Error initializing Nvidia Encoder. Refer to Nvidia's GPU Support Matrix to confirm your GPU supports hardware encoding, codec and surface format and check the encoder documentation to verify your choice of encoding paramaters are supported.OpenCV(4.7.0) D:\opencv_build\opencv_contrib-4.7.0\modules\cudacodec\src\NvEncoder.cpp:47: error: (-217:Gpu API call) NVENC returned error [Code = 10] in function 'cv::cudacodec::NvEncoder::NvEncoder'
> 

it seems I can only successfully create 5 cv2.cudacodec.createVideoWriter object, I don’t know if this is a feature or if am I missing something when building OpenCV with cuda or if I write the wrong code

FYI, I’m using OpenCV 4.7.0, Python 3.8.8, cuda 11.2, Video_Codec_SDK_11.1.5, I have two NVIDIA GeForce RTX 3080 Ti.

can anyone help

Nvidia restrict the number of concurrent encoding sessions on there consumer grade hardware. If you refer to

you will see that for the RTX 3080Ti the number of concurrent sessions is 5.

Note:

  1. Nvidia do not restrict the number of concurrent decoding sessions (I guess consumers use decoding and businesses mostly use encoding) so if you just wanted to read and write the video without transcoding as above you could enable rawMode on VideoReader and write the encoded source directly to a file.
  2. The latest commits to the 4.x branch now enable you to write to container formats. In 4.7.0 you can only write to raw .h264 files.
1 Like

thanks for your replay, but I have another question since the number of concurrent encoding sessions on RTX 3080Ti is 5, but I have two GPUs, why I can only init 5 decoding sessions on 2 GPUS, I just tested on my code, no matter I use one GPU or two GPUS, I can only init 5 decoding sessions, shouldn’t I create 10 decoding sessions on 2 GPUS?

Unfortunately not according to the way Nvidia have restricted them. From the docs

NVENC Licensing Policy

As far as NVENC hardware encoding is concerned, NVIDIA GPUs are classified into two categories: “qualified” and “non-qualified”. On qualified GPUs, the number of concurrent encode sessions is limited by available system resources (encoder capacity, system memory, video memory etc.). On non-qualified GPUs, the number of concurrent encode sessions is limited to 5 per system. This limit of 5 concurrent sessions per system applies to the combined number of encoding sessions executed on all non-qualified cards present in the system.

For a complete list of qualified and non-qualified GPUs, refer to https://developer.nvidia.com/nvidia-video-codec-sdk

For example, on a system with one Quadro RTX4000 card (which is a qualified GPU) and three GeForce cards (which are non-qualified GPUs), the application can run N simultaneous encode sessions on Quadro RTX4000 card (where N is defined by the encoder/memory/hardware limitations) and 5 sessions on all the GeForce cards combined. Thus, the limit on the number of simultaneous encode sessions for such a system is N + 5.

now I see, thanks a lot for your help!