Status and usage of cudacodec::VideoWriter

tvercaut · April 24, 2024, 9:54am

Hi,
I am interested in efficiently encoding and writing video files from images stored on the GPU from python. It seems like the cv::cudacodec::VideoWriter class has recently been revived but there isn’t much documentation about it.

I saw the c++ sample here: opencv/samples/gpu/video_writer.cpp at 4.x · opencv/opencv · GitHub
However, I was keen to get it in python. Below is what I tried but the resulting mp4 file does not play with vlc. Any hint on what I am doing wrong?

import numpy as np
import matplotlib.pyplot as plt
import imageio.v3 as iio
import cv2

# Fetch a test image
im = iio.imread('https://upload.wikimedia.org/wikipedia/commons/b/b6/PM5644-1920x1080.gif')
img = im.squeeze()

plt.imshow(img)

# Convert test image to GpuMat
img_gpu = cv2.cuda_GpuMat(img)
print("img_gpu:", img_gpu, img_gpu.size())

# Create video writer
cf=cv2.cudacodec.COLOR_FORMAT_RGB
videowriter = cv2.cudacodec.createVideoWriter(
    'output.mp4', 
    frameSize=img_gpu.size(), 
    codec=cv2.cudacodec.HEVC, 
    fps=30, 
    colorFormat=cf)

# Encode the same image many times
for i in range(100):
  videowriter.write(img_gpu)

# Clean up
videowriter.release()

If relevant, I am using the wheel from github:cudawarped/opencv-python-cuda-wheels/releases/tag/4.9.0.80

cudawarped · April 24, 2024, 10:53am

You’re not doing anything wrong, your output.mp4 is not an mp4 file its an incorrectly named h265 file. If you rename it to output.h265 vlc will play it.

The update on windows to allow writing to container formats (e.g. mp4) was not included until after the wheel you are using was built

github.com/opencv/opencv_contrib

`cudacodec`: Enable features available with updated ffmpeg dll

opencv:4.x ← cudawarped:cudacodec_enable_win32_for_updated_dll

opened 07:00PM - 30 Dec 23 UTC

cudawarped

+0 -70

Enable disabled windows features (https://github.com/opencv/opencv_contrib/pull/…3569, https://github.com/opencv/opencv_contrib/pull/3488, https://github.com/opencv/opencv_contrib/pull/3542) which are now available in the ffmpeg shared libary. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake

If I get chance later I will build an updated wheel which includes this feature.

tvercaut · April 24, 2024, 11:36am

Ok, thanks. I am looking forward to the new wheel to be able to easily test writting to a mp4 container.

On a different but related note (sorry for hijacking the thread), are there any plans to support 10bit encoding through cv::cudacodec? I am ultimately interested in this feature (still with GPU data) but am yet to find an option. For example, this is also lacking with torchaudio:

github.com/pytorch/audio

Support for 10bit / 12bit encoding (e.g. yuv420p10le) in StreamWriter

opened 07:57PM - 23 Apr 24 UTC

tvercaut

### 🚀 The feature The ability to provide 16 bit data (`torch.int16`) as input t…o `StreamWriter` with the understanding that the data will be truncated to 10/12 bit depending on the selected `encoder_format` would be very helpful. ### Motivation, pitch 10 bit video encoding is becoming more mainstream and is supported in nvenc. 12 bit video encoding is also getting more traction. torchaudio already supports reading 10 bit video and storing it in 16 bit tensors: https://github.com/pytorch/audio/issues/3331 https://github.com/pytorch/audio/pull/3023 https://github.com/pytorch/audio/pull/3332 It would be great to have the converse support on the encoding side. ### Alternatives - Copying the torch tensor to CPU and using [`imageio-ffmpeg`](https://github.com/imageio/imageio-ffmpeg) - Looking into [VideoProcessingFramework](https://github.com/NVIDIA/VideoProcessingFramework) or others ### Additional context N/A

and there is no container support in VPF:

cudawarped · April 24, 2024, 12:02pm

Currently no because OpenCV doesn’t naitively support this format. That said it may be fairly straightforward to implement. How would you be passing the data (10bit RGB) and do you have a sample?

tvercaut · April 24, 2024, 12:58pm

Nice to hear it could be fairly straightforward to implement! The data would be stored as 16 bit on the OpenCV side with the expectation that the actual maximum value in th GpuMat be under 1023. Alternatively, the 10 bit encoding could use only the most signicant bits in the 16 bit input but I find this a bit more counter-intuitive.

Should I file a feature request?

Here is a simple function I often use to generate a 10 bit grayscale image:

def make_test_im():
  # Create simple image with  gradient from
  # 0 to (2^bitdepth - 1)
  bitdepth = 10
  unusedbitdepth = 16-bitdepth
  hbd = int(bitdepth/2)
  im = np.zeros((1<<hbd,1<<hbd),dtype=np.uint16)
  im[:] = np.arange(0,1<<bitdepth).reshape(im.shape)

  # Tile it to be at least 64 pix as ffmpeg encoder may only work
  # with image of size 64 and up
  numreps = 5
  im = np.tile(im, (numreps, numreps))
  print('im',np.min(im),np.max(im),im.shape,im.dtype)
  return im

cudawarped · April 24, 2024, 1:27pm

NVENC uses a packed 10bit as an input format. This is specified in nvEncodeAPI.h, excert below

NV_ENC_BUFFER_FORMAT_ARGB10                          = 0x02000000,  /**< 10 bit Packed A2R10G10B10. This is a word-ordered format
                                                                             where a pixel is represented by a 32-bit word with B
                                                                             in the lowest 10 bits, G in the next 10 bits, R in the
                                                                             10 bits after that and A in the highest 2 bits. */

This would require an additional CUDA kernel to convert the 16bit input to the correct format making it less straighforward. If the input was in the format specified above then I “think” the required modifications to accomidate this would be small.

You have nothing to lose, a community member may implement this feature but I wouldn’t hold your breath. If on the other hand the video input was in the requried format (NV_ENC_BUFFER_FORMAT_ARGB10) I may take a look when I have time.

tvercaut · April 24, 2024, 4:12pm

Thanks. Having support for inputs that needs to be in packed 10 bit format would be already great. Finding a way to do the bit packing independently shouldn’t be too hard to achive. It could maybe even be done in python directly with say cupy:
https://docs.cupy.dev/en/latest/reference/generated/cupy.unpackbits.html
https://docs.cupy.dev/en/latest/reference/generated/cupy.packbits.html

Would the most convenient input then be presented as uint32?

cudawarped · April 24, 2024, 4:58pm

Without giving it a lot of thought I would think float32 from numpy should be the most convenient because a numpy array of this type can be uploaded directly to a GpuMat.

Will you be processing the data on the GPU before hand or uploading everything from the CPU. If its the latter are you sure cudacodec::VideoWriter is faster than cv::VideoWriter for your workflow?

If you had a single 10 bit packed frame representing an image then I could take a look.

cudawarped · April 24, 2024, 5:09pm

I’ve uploaded a new wheel which should allow you to write to .mp4. Let me know if you have any issues.

tvercaut · April 24, 2024, 8:06pm

After second thought, I think I should have mentioned that 10 bit YUV is probably better than 10 bit RGB in my case. While it seemed somewhat insignificant initially, it could actually make things much simpler as nvenc expects 10 bit YUV to be provided as 3 x 16 bit data and uses only the most significant bits:

    NV_ENC_BUFFER_FORMAT_YUV420_10BIT                    = 0x00010000,  /**< 10 bit Semi-Planar YUV [Y plane followed by interleaved UV plane]. Each pixel of size 2 bytes. Most Significant 10 bits contain pixel data. */
    NV_ENC_BUFFER_FORMAT_YUV444_10BIT                    = 0x00100000,  /**< 10 bit Planar YUV444 [Y plane followed by U and V planes]. Each pixel of size 2 bytes. Most Significant 10 bits contain pixel data.  */

github.com

FFmpeg/nv-codec-headers/blob/9934f17316b66ce6de12f3b82203a298bc9351d8/include/ffnvcodec/nvEncodeAPI.h#L385-L386


      
          NV_ENC_BUFFER_FORMAT_YUV420_10BIT                    = 0x00010000,  /**< 10 bit Semi-Planar YUV [Y plane followed by interleaved UV plane]. Each pixel of size 2 bytes. Most Significant 10 bits contain pixel data. */
          NV_ENC_BUFFER_FORMAT_YUV444_10BIT                    = 0x00100000,  /**< 10 bit Planar YUV444 [Y plane followed by U and V planes]. Each pixel of size 2 bytes. Most Significant 10 bits contain pixel data.  */

Would supporting NV_ENC_BUFFER_FORMAT_YUV420_10BIT indeed be esier?

Many thanks! I don’ have access to a cuda environement right now but I will test as soon as.

Yes indeed, the data will be on the GPU already, most likely pytorch. I was planning to make use of the feature you implemented here to help with interoperability:

github.com/opencv/opencv

`cuda`: Add bindings to allow `GpuMat` and `Stream` objects to be initialized from memory initialized in other libraries

opencv:4.x ← cudawarped:cuda_add_futher_python_interop

opened 05:36PM - 17 Mar 23 UTC

cudawarped

+45 -0

Added python bindings to enable `GpuMat` and `Stream` objects to be initialized …from raw pointers + updated test cases. This https://github.com/opencv/opencv/pull/16513 exposed the memory created by OpenCV `GpuMat` and `Stream`s to other python libraries whereas this PR should allow OpenCV `GpuMat` and `Stream` objects to use raw pointers to memory created in other python libraries. Related issues https://github.com/rapidsai/cucim/issues/329#issuecomment-1397507158 and https://github.com/opencv/opencv-python/issues/818 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake

cudawarped · April 25, 2024, 4:47am

Again I haven’t given this a lot of thought but both should be equivelent from the persepective of the GpuMat and cudacodec::VideoWriter classes. NV_ENC_BUFFER_FORMAT_ARGB10 would map to CV_32FC1 and NV_ENC_BUFFER_FORMAT_YUV420_10BIT to CV_16SC3.

If on the other hand the comparisson is between NV_ENC_BUFFER_FORMAT_YUV420_10BIT and 16 bit RGB then the former should be much easier to deal with because there would be no internal conversion. Additionaly if 16 bit RGB is not a standard format it doesn’t make sense to write a specific conversion routine for it inside OpenCV.

tvercaut · April 25, 2024, 4:14pm

Thanks. I filed a feature request here:

github.com/opencv/opencv_contrib

[Feature] Support 10 bit YUV video in `cv::cudacodec::VideoWriter` and `cv::cudacodec::VideoReader`

opened 04:12PM - 25 Apr 24 UTC

tvercaut

##### System information (version) - OpenCV => 4.9 ##### Detailed descript…ion 10 bit video encoding is becoming more mainstream and is supported in nvdec / nvenc. It would thus be great to also support it in `cudacodec`. As discussed in https://forum.opencv.org/t/status-and-usage-of-cudacodec-videowriter/17396, 10 bit YUV images could be represented as 16bit integers with the 10 most significant bits corresponding to the actual data. The relevant nvdec/nvenc pixel formats are: ```c++ NV_ENC_BUFFER_FORMAT_YUV420_10BIT = 0x00010000, /**< 10 bit Semi-Planar YUV [Y plane followed by interleaved UV plane]. Each pixel of size 2 bytes. Most Significant 10 bits contain pixel data. */ NV_ENC_BUFFER_FORMAT_YUV444_10BIT = 0x00100000, /**< 10 bit Planar YUV444 [Y plane followed by U and V planes]. Each pixel of size 2 bytes. Most Significant 10 bits contain pixel data. */ ``` https://github.com/FFmpeg/nv-codec-headers/blob/9934f17316b66ce6de12f3b82203a298bc9351d8/include/ffnvcodec/nvEncodeAPI.h#L385-L386

On the python wheel side, I just realised I could use the one you kindly provided as it is for windows rather than linux.

cudawarped · April 26, 2024, 2:25pm

It seems like a small change, if you are using windows and you let me know the compute capability of your GPU I can build you a wheel to test?

tvercaut · April 26, 2024, 3:17pm

I am performing these tests on google colab for now which runs on ubuntu rather than windows. My actual target is running a similar environment.

> nvidia-smi --query-gpu=compute_cap --format=csv
compute_cap
7.5

> nvidia-smi
Fri Apr 26 15:11:00 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla T4                       Off | 00000000:00:04.0 Off |                    0 |
| N/A   41C    P8               9W /  70W |      3MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

> !python -m torch.utils.collect_env
[...]
OS: Ubuntu 22.04.3 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: 14.0.0-1ubuntu1.1
CMake version: version 3.27.9
Libc version: glibc-2.35

Python version: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] (64-bit runtime)
Python platform: Linux-6.1.58+-x86_64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: 12.2.140
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: Tesla T4
Nvidia driver version: 535.104.05
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:                       x86_64
CPU op-mode(s):                     32-bit, 64-bit
Address sizes:                      46 bits physical, 48 bits virtual
Byte Order:                         Little Endian
CPU(s):                             2
On-line CPU(s) list:                0,1
Vendor ID:                          GenuineIntel
Model name:                         Intel(R) Xeon(R) CPU @ 2.00GHz
[...]

cudawarped · April 26, 2024, 6:50pm

If you are on linux I don’t understand why you were getting the original error, have you confirmed this is fixed?

tvercaut · April 27, 2024, 8:27am

I used the latest linux wheel I could see on your release assets:

Unless I miss something obvious (which is very much possible), it looks like this was built without ffmpeg support:

!pip install --upgrade --force-reinstall https://github.com/cudawarped/opencv-python-cuda-wheels/releases/download/4.9.0.80/opencv_contrib_python-4.9.0.80-cp37-abi3-linux_x86_64.whl
import cv2
print(cv2.getBuildInformation())

Collecting opencv-contrib-python==4.9.0.80
  Downloading https://github.com/cudawarped/opencv-python-cuda-wheels/releases/download/4.9.0.80/opencv_contrib_python-4.9.0.80-cp37-abi3-linux_x86_64.whl (314.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 314.1/314.1 MB 4.5 MB/s eta 0:00:00
Requirement already satisfied: numpy>=1.21.2 in /usr/local/lib/python3.10/dist-packages (from opencv-contrib-python==4.9.0.80) (1.26.4)

General configuration for OpenCV 4.9.0 =====================================
  Version control:               4.9.0

  Extra modules:
    Location (extra):            /home/b/repos/opencv/opencv-python/opencv_contrib/modules
    Version control (extra):     4.9.0

  Platform:
    Timestamp:                   2024-01-08T14:49:26Z
    Host:                        Linux 5.15.133.1-microsoft-standard-WSL2 x86_64
    CMake:                       3.22.1
    CMake generator:             Ninja
    CMake build tool:            /usr/bin/ninja
    Configuration:               Release

  CPU/HW features:
    Baseline:                    SSE SSE2 SSE3
      requested:                 SSE3
    Dispatched code generation:  SSE4_1 SSE4_2 FP16 AVX AVX2 AVX512_SKX
      requested:                 SSE4_1 SSE4_2 AVX FP16 AVX2 AVX512_SKX
      SSE4_1 (16 files):         + SSSE3 SSE4_1
      SSE4_2 (1 files):          + SSSE3 SSE4_1 POPCNT SSE4_2
      FP16 (0 files):            + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 AVX
      AVX (8 files):             + SSSE3 SSE4_1 POPCNT SSE4_2 AVX
      AVX2 (36 files):           + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2
      AVX512_SKX (5 files):      + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2 AVX_512F AVX512_COMMON AVX512_SKX

  C/C++:
    Built as dynamic libs?:      NO
    C++ standard:                11
    C++ Compiler:                /usr/bin/c++  (ver 11.4.0)
    C++ flags (Release):         -fsigned-char -W -Wall -Wreturn-type -Wnon-virtual-dtor -Waddress -Wsequence-point -Wformat -Wformat-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Wsuggest-override -Wno-delete-non-virtual-dtor -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections  -msse -msse2 -msse3 -fvisibility=hidden -fvisibility-inlines-hidden -O3 -DNDEBUG  -DNDEBUG
    C++ flags (Debug):           -fsigned-char -W -Wall -Wreturn-type -Wnon-virtual-dtor -Waddress -Wsequence-point -Wformat -Wformat-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Wsuggest-override -Wno-delete-non-virtual-dtor -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections  -msse -msse2 -msse3 -fvisibility=hidden -fvisibility-inlines-hidden -g  -O0 -DDEBUG -D_DEBUG
    C Compiler:                  /usr/bin/cc
    C flags (Release):           -fsigned-char -W -Wall -Wreturn-type -Waddress -Wsequence-point -Wformat -Wformat-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections  -msse -msse2 -msse3 -fvisibility=hidden -O3 -DNDEBUG  -DNDEBUG
    C flags (Debug):             -fsigned-char -W -Wall -Wreturn-type -Waddress -Wsequence-point -Wformat -Wformat-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections  -msse -msse2 -msse3 -fvisibility=hidden -g  -O0 -DDEBUG -D_DEBUG
    Linker flags (Release):      -Wl,--exclude-libs,libippicv.a -Wl,--exclude-libs,libippiw.a   -Wl,--gc-sections -Wl,--as-needed -Wl,--no-undefined  
    Linker flags (Debug):        -Wl,--exclude-libs,libippicv.a -Wl,--exclude-libs,libippiw.a   -Wl,--gc-sections -Wl,--as-needed -Wl,--no-undefined  
    ccache:                      NO
    Precompiled headers:         NO
    Extra dependencies:          /usr/lib/x86_64-linux-gnu/libz.so /usr/lib/wsl/lib/libcuda.so /usr/lib/wsl/lib/libnvcuvid.so /usr/lib/wsl/lib/libnvidia-encode.so Iconv::Iconv m pthread cudart_static dl rt nppc nppial nppicc nppidei nppif nppig nppim nppist nppisu nppitc npps cublas cudnn cufft -L/usr/local/cuda/lib64 -L/usr/lib/x86_64-linux-gnu
    3rdparty dependencies:       libprotobuf ade ittnotify libjpeg-turbo libwebp libpng libtiff libopenjp2 IlmImf ippiw ippicv

  OpenCV modules:
    To be built:                 aruco bgsegm bioinspired calib3d ccalib core cudaarithm cudabgsegm cudacodec cudafeatures2d cudafilters cudaimgproc cudalegacy cudaobjdetect cudaoptflow cudastereo cudawarping cudev datasets dnn dnn_objdetect dnn_superres dpm face features2d flann fuzzy gapi hfs highgui img_hash imgcodecs imgproc intensity_transform line_descriptor mcc ml objdetect optflow phase_unwrapping photo plot python3 quality rapid reg rgbd saliency shape stereo stitching structured_light superres surface_matching text tracking video videoio videostab wechat_qrcode xfeatures2d ximgproc xobjdetect xphoto
    Disabled:                    world
    Disabled by dependency:      -
    Unavailable:                 alphamat cannops cvv freetype hdf java julia matlab ovis python2 sfm ts viz
    Applications:                -
    Documentation:               NO
    Non-free algorithms:         NO

  GUI:                           NONE
    GTK+:                        NO
    VTK support:                 NO

  Media I/O: 
    ZLib:                        /usr/lib/x86_64-linux-gnu/libz.so (ver 1.2.11)
    JPEG:                        libjpeg-turbo (ver 2.1.3-62)
    WEBP:                        build (ver encoder: 0x020f)
    PNG:                         build (ver 1.6.37)
    TIFF:                        build (ver 42 - 4.2.0)
    JPEG 2000:                   build (ver 2.5.0)
    OpenEXR:                     build (ver 2.3.0)
    HDR:                         YES
    SUNRASTER:                   YES
    PXM:                         YES
    PFM:                         YES

  Video I/O:
    DC1394:                      NO
    FFMPEG:                      NO
      avcodec:                   NO
      avformat:                  NO
      avutil:                    NO
      swscale:                   NO
      avresample:                NO
    GStreamer:                   NO
    v4l/v4l2:                    YES (linux/videodev2.h)

  Parallel framework:            pthreads

  Trace:                         YES (with Intel ITT)

  Other third-party libraries:
    Intel IPP:                   2021.10.0 [2021.10.0]
           at:                   /home/b/repos/opencv/opencv-python/_skbuild/linux-x86_64-3.7/cmake-build/3rdparty/ippicv/ippicv_lnx/icv
    Intel IPP IW:                sources (2021.10.0)
              at:                /home/b/repos/opencv/opencv-python/_skbuild/linux-x86_64-3.7/cmake-build/3rdparty/ippicv/ippicv_lnx/iw
    VA:                          NO
    Lapack:                      NO
    Eigen:                       NO
    Custom HAL:                  NO
    Protobuf:                    build (3.19.1)
    Flatbuffers:                 builtin/3rdparty (23.5.9)

  NVIDIA CUDA:                   YES (ver 12.3, CUFFT CUBLAS NVCUVID NVCUVENC)
    NVIDIA GPU arch:             50 52 53 60 61 62 70 72 75 80 86 87 89 90
    NVIDIA PTX archs:            90

  cuDNN:                         YES (ver 8.9.7)

  OpenCL:                        YES (no extra features)
    Include path:                /home/b/repos/opencv/opencv-python/opencv/3rdparty/include/opencl/1.2
    Link libraries:              Dynamic load

  Python 3:
    Interpreter:                 /home/b/miniforge3/envs/py37/bin/python (ver 3.7.12)
    Libraries:                   /home/b/miniforge3/envs/py37/lib/libpython3.7m.so (ver 3.7.12)
    numpy:                       /home/b/miniforge3/envs/py37/lib/python3.7/site-packages/numpy/core/include (ver 1.21.6)
    install path:                python/cv2/python-3

  Python (for build):            /home/b/miniforge3/envs/py37/bin/python

  Java:                          
    ant:                         NO
    Java:                        NO
    JNI:                         NO
    Java wrappers:               NO
    Java tests:                  NO

  Install to:                    /home/b/repos/opencv/opencv-python/_skbuild/linux-x86_64-3.7/cmake-install
-----------------------------------------------------------------

cudawarped · April 27, 2024, 11:12am

Your correct I neglected to install ffmpeg first. I guess you were running it in a notebook and didn’t get the following error

terminating with uncaught exception of type cv::Exception: OpenCV(4.9.0) /home/b/repos/opencv/opencv-python/opencv_contrib/modules/cudacodec/src/video_writer.cpp:83: error: (-213:The function/feature is not implemented) FFmpeg backend not found in function ‘FFmpegVideoWriter’

I will upload some new ubuntu wheels when I have chance.

cudawarped · April 28, 2024, 6:30am

@tvercaut If you have time can you test

The test below demonstrates how to use it

import cv2 as cv
import numpy as np
src = 'big_buck_bunny.mp4'
reader = cv.cudacodec.createVideoReader(src)
reader.set(cv.cudacodec.ColorFormat_NV_NV12)
fmt = reader.format()
stream = cv.cuda.Stream()
writer = cv.cudacodec.createVideoWriter('output.mp4', fmt.targetSz, codec=cv.cudacodec.HEVC, fps=30,
                                        colorFormat=cv.cudacodec.ColorFormat_NV_YUV410_10BIT, stream=stream)
ret, frame_nv12 = reader.nextFrame()
frame_nv12_16bit = cv.cuda_GpuMat(frame_nv12.size(), cv.CV_16U)
frame_yuv410_10bit = cv.cuda_GpuMat(frame_nv12.size(),cv.CV_16U)
for i in range(100):
    frame_nv12.convertTo(cv.CV_16U, stream, dst=frame_nv12_16bit)
    cv.cuda.lshift(frame_nv12_16bit, 8, dst=frame_yuv410_10bit, stream=stream)
    writer.write(frame_yuv410_10bit)
    reader.nextFrame(frame_nv12, stream=stream)
writer.release()

tvercaut · April 28, 2024, 11:32am

Thanks, that looks great already! Your snippet mostly works for me although I found a few issues.

A small typo I assume but ColorFormat_NV_YUV410_10BIT should be ColorFormat_NV_YUV420_10BIT
The resulting video is readble in vlc but not on Mac OS Quicktime. This is probably related to the choice of fourcc as Quicktime is a bit peculiar with this and requires hvc1 instead of hev1:
opencv_contrib/modules/cudacodec/src/video_writer.cpp at 3c2bcbfe8374edaf3eb756b560374244538d57b6 · opencv/opencv_contrib · GitHub
Could a way of changing the fourcc be exposed to the user?
The corresponding reading of 10bit YUV420 with cudacodec doesn’t seem to work. If I specify reader.set(cv.cudacodec.ColorFormat_NV_YUV410_10BIT) instead of reader.set(cv.cudacodec.ColorFormat_NV_NV12), the frames from the reader have 4 uint8 channels and the size of the video rather than what you currently expect for the writer (1 channel, frame height of 1.5x the video height, uint16 element type)

For testing purposes, here is how to generate a test sample:

ffmpeg -hide_banner -loglevel error -stats -f lavfi -i testsrc=duration=10:size=1280x720:rate=30 -y -pix_fmt yuv420p10le -c:v libx265 -x265-params log-level=warning -tag:v hvc1 testsrc-hevc-yuv420p10le.mp4

cudawarped · April 28, 2024, 7:02pm

I haven’t implemented that change yet. I am still not sure if outputing that format from the video reader is realvent to OpenCV as no routines (that I know of) can process it. What is your use case in OpenCV? I assumed you were processing footage from a sensor and wanted to archive the footage in a higher precision not that you needed to read it back into OpenCV?

Topic		Replies	Views
cv2.cudacodec.createVideoWriter does not work? Python cuda , videoio	8	2905	May 10, 2022
Reading and Writing Videos: Python on GPU with CUDA - VideoCapture and VideoWriter Python	17	27950	February 8, 2024
Convert h264 file to mp4 with `cv::cudacodec` C++ gpu , cuda , videoio	28	1424	December 30, 2023
VideoWriter.write() very slow Python performance , videoio , nvidia	4	4301	November 3, 2022
cv::cudacodec::createVideoReader Not Working as expected C++ build , cuda , videoio	1	1012	June 17, 2022

Related topics