Hi,
I am interested in efficiently encoding and writing video files from images stored on the GPU from python. It seems like the cv::cudacodec::VideoWriter class has recently been revived but there isn’t much documentation about it.
import numpy as np
import matplotlib.pyplot as plt
import imageio.v3 as iio
import cv2
# Fetch a test image
im = iio.imread('https://upload.wikimedia.org/wikipedia/commons/b/b6/PM5644-1920x1080.gif')
img = im.squeeze()
plt.imshow(img)
# Convert test image to GpuMat
img_gpu = cv2.cuda_GpuMat(img)
print("img_gpu:", img_gpu, img_gpu.size())
# Create video writer
cf=cv2.cudacodec.COLOR_FORMAT_RGB
videowriter = cv2.cudacodec.createVideoWriter(
'output.mp4',
frameSize=img_gpu.size(),
codec=cv2.cudacodec.HEVC,
fps=30,
colorFormat=cf)
# Encode the same image many times
for i in range(100):
videowriter.write(img_gpu)
# Clean up
videowriter.release()
If relevant, I am using the wheel from github:cudawarped/opencv-python-cuda-wheels/releases/tag/4.9.0.80
You’re not doing anything wrong, your output.mp4 is not an mp4 file its an incorrectly named h265 file. If you rename it to output.h265 vlc will play it.
The update on windows to allow writing to container formats (e.g. mp4) was not included until after the wheel you are using was built
If I get chance later I will build an updated wheel which includes this feature.
Ok, thanks. I am looking forward to the new wheel to be able to easily test writting to a mp4 container.
On a different but related note (sorry for hijacking the thread), are there any plans to support 10bit encoding through cv::cudacodec? I am ultimately interested in this feature (still with GPU data) but am yet to find an option. For example, this is also lacking with torchaudio:
Currently no because OpenCV doesn’t naitively support this format. That said it may be fairly straightforward to implement. How would you be passing the data (10bit RGB) and do you have a sample?
Nice to hear it could be fairly straightforward to implement! The data would be stored as 16 bit on the OpenCV side with the expectation that the actual maximum value in th GpuMat be under 1023. Alternatively, the 10 bit encoding could use only the most signicant bits in the 16 bit input but I find this a bit more counter-intuitive.
Should I file a feature request?
Here is a simple function I often use to generate a 10 bit grayscale image:
def make_test_im():
# Create simple image with gradient from
# 0 to (2^bitdepth - 1)
bitdepth = 10
unusedbitdepth = 16-bitdepth
hbd = int(bitdepth/2)
im = np.zeros((1<<hbd,1<<hbd),dtype=np.uint16)
im[:] = np.arange(0,1<<bitdepth).reshape(im.shape)
# Tile it to be at least 64 pix as ffmpeg encoder may only work
# with image of size 64 and up
numreps = 5
im = np.tile(im, (numreps, numreps))
print('im',np.min(im),np.max(im),im.shape,im.dtype)
return im
NVENC uses a packed 10bit as an input format. This is specified in nvEncodeAPI.h, excert below
NV_ENC_BUFFER_FORMAT_ARGB10 = 0x02000000, /**< 10 bit Packed A2R10G10B10. This is a word-ordered format
where a pixel is represented by a 32-bit word with B
in the lowest 10 bits, G in the next 10 bits, R in the
10 bits after that and A in the highest 2 bits. */
This would require an additional CUDA kernel to convert the 16bit input to the correct format making it less straighforward. If the input was in the format specified above then I “think” the required modifications to accomidate this would be small.
You have nothing to lose, a community member may implement this feature but I wouldn’t hold your breath. If on the other hand the video input was in the requried format (NV_ENC_BUFFER_FORMAT_ARGB10) I may take a look when I have time.
Without giving it a lot of thought I would think float32 from numpy should be the most convenient because a numpy array of this type can be uploaded directly to a GpuMat.
Will you be processing the data on the GPU before hand or uploading everything from the CPU. If its the latter are you sure cudacodec::VideoWriter is faster than cv::VideoWriter for your workflow?
If you had a single 10 bit packed frame representing an image then I could take a look.
After second thought, I think I should have mentioned that 10 bit YUV is probably better than 10 bit RGB in my case. While it seemed somewhat insignificant initially, it could actually make things much simpler as nvenc expects 10 bit YUV to be provided as 3 x 16 bit data and uses only the most significant bits:
NV_ENC_BUFFER_FORMAT_YUV420_10BIT = 0x00010000, /**< 10 bit Semi-Planar YUV [Y plane followed by interleaved UV plane]. Each pixel of size 2 bytes. Most Significant 10 bits contain pixel data. */
NV_ENC_BUFFER_FORMAT_YUV444_10BIT = 0x00100000, /**< 10 bit Planar YUV444 [Y plane followed by U and V planes]. Each pixel of size 2 bytes. Most Significant 10 bits contain pixel data. */
Would supporting NV_ENC_BUFFER_FORMAT_YUV420_10BIT indeed be esier?
Many thanks! I don’ have access to a cuda environement right now but I will test as soon as.
Yes indeed, the data will be on the GPU already, most likely pytorch. I was planning to make use of the feature you implemented here to help with interoperability:
Again I haven’t given this a lot of thought but both should be equivelent from the persepective of the GpuMat and cudacodec::VideoWriter classes. NV_ENC_BUFFER_FORMAT_ARGB10 would map to CV_32FC1 and NV_ENC_BUFFER_FORMAT_YUV420_10BIT to CV_16SC3.
If on the other hand the comparisson is between NV_ENC_BUFFER_FORMAT_YUV420_10BIT and 16 bit RGB then the former should be much easier to deal with because there would be no internal conversion. Additionaly if 16 bit RGB is not a standard format it doesn’t make sense to write a specific conversion routine for it inside OpenCV.
Your correct I neglected to install ffmpeg first. I guess you were running it in a notebook and didn’t get the following error
terminating with uncaught exception of type cv::Exception: OpenCV(4.9.0) /home/b/repos/opencv/opencv-python/opencv_contrib/modules/cudacodec/src/video_writer.cpp:83: error: (-213:The function/feature is not implemented) FFmpeg backend not found in function ‘FFmpegVideoWriter’
I will upload some new ubuntu wheels when I have chance.
The corresponding reading of 10bit YUV420 with cudacodec doesn’t seem to work. If I specify reader.set(cv.cudacodec.ColorFormat_NV_YUV410_10BIT) instead of reader.set(cv.cudacodec.ColorFormat_NV_NV12), the frames from the reader have 4 uint8 channels and the size of the video rather than what you currently expect for the writer (1 channel, frame height of 1.5x the video height, uint16 element type)
For testing purposes, here is how to generate a test sample:
I haven’t implemented that change yet. I am still not sure if outputing that format from the video reader is realvent to OpenCV as no routines (that I know of) can process it. What is your use case in OpenCV? I assumed you were processing footage from a sensor and wanted to archive the footage in a higher precision not that you needed to read it back into OpenCV?