Status and usage of cudacodec::VideoWriter

tvercaut · April 28, 2024, 9:34pm

Indeed, I want to store and work with data from a 10bit machine vision camera (monochrome). The 10 bit data will most likely be represented as 16 bit for all processing purposes (although some downgrading to 8 bit may happen at some point, say for display. The processing pipeline can be developped in OpenCV but could also be in PyTorch or else (as this is for academic research purposes).

Enabling both recording and reading of 10 bit data in OpenCV would allow us to pre-record some videos and work on them offline with the same characteristics as if coming from the camera. It also helps with testing that writting works.

For the specific case of NV_ENC_BUFFER_FORMAT_YUV420_10BIT for monochrome data, this would work rather seamlesslessy with opencv since we would get the Y plane already in 16 bit (most significant 10 bits being written to) and can simply discard the U and V planes (unused for monochrome) by cropping the images. Thus all OpenCV routines already supporting 16 bit grayscale data would be usable.

If YUV is actually used for colour as well, converting the YUV420 frame (single “packed” 16 bit channel) to a “full” 3 channel 16bit YUV or RGB matrix can be done easily with a few opencv routines (e.g. cv2.resize, cv2.cuda.merge, etc.).

It may look more inuitive to work with a grayscale colour format rather than yuv420 for grayscale image capture. However, 10 bit grayscale hevc is not widely supported (I don’t think nvenc does it for example) and there are little downsides in just having zero filled U and V (semi) planes.

cudawarped · May 4, 2024, 9:16am

I’ve updated the wheel

With this wheel you should be able to read back YUV420_BIT with cudacodec::VideoReader. For usage examples see this notebook.

tvercaut · May 7, 2024, 10:46am

Many thanks again! I tried a simple read+write which mostly works but I get some flickering in the output video when played with Quicktime on Mac. It looks fine in vlc and ffplay though. Not sure what could lead to this.

Here is a code snippet:

!pip install https://github.com/cudawarped/opencv-python-cuda-wheels/releases/download/4.9.80_04%2F05%2F24/opencv_contrib_python_rolling-4.9.0.80-cp37-abi3-linux_x86_64.whl

!ffmpeg -hide_banner -loglevel error -stats -f lavfi -i testsrc=duration=10:size=1280x720:rate=30 -y -pix_fmt yuv420p10le -c:v libx265 -x265-params log-level=warning -tag:v hvc1 testsrc-hevc-yuv420p10le.mp4
!ffprobe -hide_banner testsrc-hevc-yuv420p10le.mp4

import cv2 as cv

src = 'testsrc-hevc-yuv420p10le.mp4'
params = cv.cudacodec.VideoReaderInitParams()
params.output16Bit = True
reader = cv.cudacodec.createVideoReader(src, params=params)
reader.set(cv.cudacodec.ColorFormat_NV_YUV420_10BIT)
readfmt = reader.format()
print(f"Video size: {readfmt.targetSz}")
print(f"Video fps: {readfmt.fps}")
print(f"Video codec: {readfmt.codec}")

dest = src[:-4]+'-cudacodec.mp4'
writer = cv.cudacodec.createVideoWriter(
    dest,
    frameSize = readfmt.targetSz,
    codec = cv.cudacodec.HEVC,
    fps = readfmt.fps,
    colorFormat = cv.cudacodec.ColorFormat_NV_YUV420_10BIT,
    )

# Get the first frame put
ret, frame = reader.nextFrame();
print(f"frame size: {frame.size()}")
print(f"frame channels: {frame.channels()}")
print(f"frame elemSize: {frame.elemSize()}")
print(f"frame type: {frame.type()} (0 means CV_8UC1 | 2 means CV_16UC1 | 24 means CV_8UC4)")

while ret:
  writer.write(frame)

  # Get next frame
  ret, frame = reader.nextFrame(frame)

writer.release()

!ffprobe -hide_banner {dest}

cudawarped · May 7, 2024, 2:07pm

That’s great! Regarding the flickering I’m not sure either, I suspect its due to the format quicktime expects the hevc file to be in which could be related to the nvenc output (seems unlikely) or the encapsulated ffmpeg file. I don’t have a Mac so I am unable to give any more insight. Do you get the same effect when using h264?

tvercaut · May 7, 2024, 4:29pm

I get the same effect with h264 (although this is using 8bit, not 10bit data as nvenc apparently doesn’t support 10bit h264).

It may be related to asynchroneous encoding muxing. The generated frame PTS are apparently not monotonically increasing. Maybe Quicktime struggles with this…

frames.frame.0.pts=0
frames.frame.1.pts=1024
frames.frame.2.pts=512
frames.frame.3.pts=2048
frames.frame.4.pts=1536
frames.frame.5.pts=3072
frames.frame.6.pts=2560
frames.frame.7.pts=4096
frames.frame.8.pts=3584
frames.frame.9.pts=5120
frames.frame.10.pts=4608
[...]

crackwitz · May 7, 2024, 6:36pm

are you getting the frames in decode order? look at their types. those might be B frames in there.

you can check by trying to disable B (and maybe P) frames, i.e. just intra.

cudawarped · May 7, 2024, 6:59pm

I think its unlikely, unless I’m mistaken I don’t think any decoder would expect the frames encoded in in display order. I suspect its more likely to be because quicktime expects the parameter sets to be out of band as specified by hvc1 fourcc and the fact that quicktime expects the fourcc to be hvc1 and not hevc. The only doubt I have regarding this is because you have the same issue with h264.

Is there a reason you can’t use a more “forgiving” media player?

cudawarped · May 8, 2024, 4:57am

I can observer this issue when using quicktime on windows. Unfortunatley this is a really really old version (apple discontinues support in 2016) so it may not support b frames anyway. To fix this I encoded without b frames and the file played correctly.

I don’t know what’s causing this or if it will resolve your issue but you can easily try this yourself by switching the encoding profile to baseline. e.g.

encoder_params = cv.cudacodec_EncoderParams()
encoder_params.encodingProfile = 1 # baseline, default is autoselect
writer = cv.cudacodec.createVideoWriter('output.mp4', fmt.targetSz, codec=cv.cudacodec.H264, fps=30,
                                        colorFormat=color_format, params=encoder_params)

tvercaut · May 8, 2024, 10:11am

This fixes the out-of-order frame display for h264 but is thus restricted to 8 bit encoding. For hevc, I tried the hevc encoding profiles listed below but this does not change the temporal jitter issue.

github.com

opencv/opencv_contrib/blob/3c2bcbfe8374edaf3eb756b560374244538d57b6/modules/cudacodec/include/opencv2/cudacodec.hpp#L134-L146


      
          enum EncodeProfile {
              ENC_CODEC_PROFILE_AUTOSELECT = 0,
              ENC_H264_PROFILE_BASELINE = 1,
              ENC_H264_PROFILE_MAIN = 2,
              ENC_H264_PROFILE_HIGH = 3,
              ENC_H264_PROFILE_HIGH_444 = 4,
              ENC_H264_PROFILE_STEREO = 5,
              ENC_H264_PROFILE_PROGRESSIVE_HIGH = 6,
              ENC_H264_PROFILE_CONSTRAINED_HIGH = 7,
              ENC_HEVC_PROFILE_MAIN = 8,
              ENC_HEVC_PROFILE_MAIN10 = 9,
              ENC_HEVC_PROFILE_FREXT = 10
          };

e.g.

encoder_params.encodingProfile = cv.cudacodec.ENC_HEVC_PROFILE_MAIN10

It’s mostly a question of convenience.

Quicktime is the default video player on mac. I would like to be able to share videos without having to always caveat it by saying “please use vlc to play these”.

Also, quicktime has very nice frame-level seeking UX. I for example often use the right and left keyboard arrows to move by one frame and spot changes, etc.

I am not sure how to check what order quicktime is actually using to display the frames. The PTS information I was showing is the output of ffprobe, now also showing frame type:

ffprobe -hide_banner -loglevel warning -select_streams v:0 -show_entries frame=pts,pict_type -print_format flat <hevc mp4 file>

frames.frame.0.pts=0
frames.frame.0.pict_type="I"
frames.frame.1.pts=1024
frames.frame.1.pict_type="B"
frames.frame.2.pts=512
frames.frame.2.pict_type="P"
frames.frame.3.pts=2048
frames.frame.3.pict_type="B"
frames.frame.4.pts=1536
frames.frame.4.pict_type="P"
frames.frame.5.pts=3072
frames.frame.5.pict_type="B"
frames.frame.6.pts=2560
frames.frame.6.pict_type="P"
frames.frame.7.pts=4096
frames.frame.7.pict_type="B"
frames.frame.8.pts=3584
frames.frame.8.pict_type="P"
frames.frame.9.pts=5120
frames.frame.9.pict_type="B"
[...]

cudawarped · May 8, 2024, 2:08pm

It looks like the mp4 file is missing the ctts atom

meaning it isn’t aware of the presentation times and assumes the frames should be presented in decode order.

I am not sure how to get FFMpeg to include this information however I can suggest a quick fix which should work and a possible solution with a very very slim chance of success.

Quick Fix

Reduce the GOP so there are not B frames and therefore no need for a ctts atom. e.g.

encoder_params.gopLength = 2

Possible Solution

I have no idea if this will work because I can’t play hevc files in my outdated version of quicktime. That said its possible ffmpeg will introduce the ctts atom automaticaly if you change the file to mov? e.g.

writer = cv.cudacodec.createVideoWriter('output.mov', fmt.targetSz, codec=cv.cudacodec.HEVC, fps=30,
                                            colorFormat=color_format, params=encoder_params)

Anyway let me know how you get on.

Side Note

I’m guessing you don’t want lossless compression but just incase this is also supported the two caveats being that the file is very large and I have no idea if it will play in quicktime. e.g.

encoder_params.tuningInfo = 4 # enable lossless compression
encoder_params.rateControlMode = 0 # set constant quantization parameters

tvercaut · May 8, 2024, 8:49pm

Reducing gop to 2 works. Another workaround is to set the ecoding tuning to low-latency:

encoder_params.tuningInfo = cv.cudacodec.ENC_TUNING_INFO_LOW_LATENCY

Switching to mov doesn’t work.

Related to this question, I read through this ffmpeg ticket:
https://trac.ffmpeg.org/ticket/502#comment:26
The suggested use of the dts2pts filter fixes the issue but is only available for h264:

ffmpeg -i <encoded h264 mp4> -c:v copy -bsf:v dts2pts <new h264 mp4>

Note that is generate warning along the lines of:

[mp4 @ 0x147006f90] Invalid DTS: 2048 PTS: 1024 in output stream 0:0, replacing by guess

cudawarped · May 11, 2024, 5:29am

It looks like I have the pts in the wrong order after all. This should be an easy fix, would you like an updated wheel when I am done?

tvercaut · May 11, 2024, 9:06am

Yes, that would be great!

On another note (with an apology for bringing up more requests), is there a way to specify the colour range to the container? As briefly mentioner in a previous comment (Status and usage of cudacodec::VideoWriter - #21 by tvercaut), I am interested in storing grayscale data in the Y plane. By default, ffmpeg seems to use a limited colour range, thus expecting 10 bit Y values to be in the range of [16; 235] * 2^2 = [64; 940] or [16; 235] * 2^2 * 2*6 = [4096; 60160] for nvenc (while keeping only the 10 most significant bits).

While it looks possible to feed full range data to nvenc but not tell it to ffmpeg, this may lead to counter-intuitive display issues. For the ffmpeg command line option, specifying the use of full range colours can be done with -color_range 2. The libav counterpart seems to be here:
https://ffmpeg.org/doxygen/trunk/pixfmt_8h.html#a3da0bf691418bc22c4bcbe6583ad589a

Is it challenging to expose this flag in the writer?

cudawarped · May 12, 2024, 5:20am

I have uploaded a new wheel with corrected pts values, hopefully the resulting video’s will be playable in quicktime.

Correct as mentioned in the notebook the muxed containers have the full range flag set to false. This can be altered but I am not sure if it has general purpose value in OpenCV as the input range can be limited before encoding the YUV values if they are not in the standard video range.

tvercaut · May 12, 2024, 1:58pm

Thanks! It all looks good to me now.

Apologies I had missed the note about it in your notebook.

I am obviously biased but I would have thought that being able to record / playback video acquired with a machine vision camera (either 8 or 10 bit) fits quite well in the remit of opencv. There is alreay support for video capture for many such cameras. The full bit range would be utilised in such camera streams.

As you say, it is possible to squash the value range of the camera feed before writing it as a video file but this will lead to precision loss on top of any (lossy) compression that may occur. The alternative of recording full range but not telling it to the container is probably workable in many cases but, in addition to skewing the semantics a bit, I am not sure what side effects may occur due to using reserved values such as 0 (and 255/1023 in 8/10 bit respectively):

Values outside the nominal ranges are allowed, but typically they would be clamped for broadcast or for display (except for Superwhite and xvYCC). Values 0 and 255 are reserved as timing references (SAV and EAV), and may not contain color data (for 8 bits, for 10 bits more values are reserved and for 12 bits even more, no values are reserved in files or RGB mode or full range YCbCr digital modes like sYCC or opYCC).

cudawarped · May 12, 2024, 7:16pm

For full range sources try this wheel

To flag the video as full range use

encoder_params.videoFullRangeFlag = True

In my quick tests, quicktime respects this flag and vlc does not.

tvercaut · May 12, 2024, 9:12pm

Nice! Maybe I am missing something but from a quick test, this flag is currently not honoured. I always see pc now in the ffprobe output.

cudawarped · May 13, 2024, 4:42am

My mistake I was only testing h264 due to QuickTime’s limited codec support on windows. It should work with the updated wheel.

tvercaut · May 13, 2024, 8:01am

Looks great, many thanks. I’m looking forward to seeing this integrated upstream.

cudawarped · May 13, 2024, 8:07am

This may take a while but in the meantime if you have any feedback let me know.

Topic		Replies	Views
cv2.cudacodec.createVideoWriter does not work? Python cuda , videoio	8	2198	May 10, 2022
Convert h264 file to mp4 with `cv::cudacodec` C++ gpu , cuda , videoio	28	1095	December 30, 2023
VideoWriter.write() very slow Python performance , videoio , nvidia	4	3123	November 3, 2022
Problem with using cv2.cudacodec.createVideoReader Python cuda , videoio , cudacodec	1	644	April 19, 2023
Error initializing Nvidia encoder - cv::cudacodec::createVideoWriter cuda	2	65	May 3, 2024

Quick Fix

Possible Solution

Side Note

Related Topics