Indeed, I want to store and work with data from a 10bit machine vision camera (monochrome). The 10 bit data will most likely be represented as 16 bit for all processing purposes (although some downgrading to 8 bit may happen at some point, say for display. The processing pipeline can be developped in OpenCV but could also be in PyTorch or else (as this is for academic research purposes).
Enabling both recording and reading of 10 bit data in OpenCV would allow us to pre-record some videos and work on them offline with the same characteristics as if coming from the camera. It also helps with testing that writting works.
For the specific case of NV_ENC_BUFFER_FORMAT_YUV420_10BIT for monochrome data, this would work rather seamlesslessy with opencv since we would get the Y plane already in 16 bit (most significant 10 bits being written to) and can simply discard the U and V planes (unused for monochrome) by cropping the images. Thus all OpenCV routines already supporting 16 bit grayscale data would be usable.
If YUV is actually used for colour as well, converting the YUV420 frame (single “packed” 16 bit channel) to a “full” 3 channel 16bit YUV or RGB matrix can be done easily with a few opencv routines (e.g. cv2.resize, cv2.cuda.merge, etc.).
It may look more inuitive to work with a grayscale colour format rather than yuv420 for grayscale image capture. However, 10 bit grayscale hevc is not widely supported (I don’t think nvenc does it for example) and there are little downsides in just having zero filled U and V (semi) planes.
Many thanks again! I tried a simple read+write which mostly works but I get some flickering in the output video when played with Quicktime on Mac. It looks fine in vlc and ffplay though. Not sure what could lead to this.
That’s great! Regarding the flickering I’m not sure either, I suspect its due to the format quicktime expects the hevc file to be in which could be related to the nvenc output (seems unlikely) or the encapsulated ffmpeg file. I don’t have a Mac so I am unable to give any more insight. Do you get the same effect when using h264?
I get the same effect with h264 (although this is using 8bit, not 10bit data as nvenc apparently doesn’t support 10bit h264).
It may be related to asynchroneous encoding muxing. The generated frame PTS are apparently not monotonically increasing. Maybe Quicktime struggles with this…
I think its unlikely, unless I’m mistaken I don’t think any decoder would expect the frames encoded in in display order. I suspect its more likely to be because quicktime expects the parameter sets to be out of band as specified by hvc1 fourcc and the fact that quicktime expects the fourcc to be hvc1 and not hevc. The only doubt I have regarding this is because you have the same issue with h264.
Is there a reason you can’t use a more “forgiving” media player?
I can observer this issue when using quicktime on windows. Unfortunatley this is a really really old version (apple discontinues support in 2016) so it may not support b frames anyway. To fix this I encoded without b frames and the file played correctly.
I don’t know what’s causing this or if it will resolve your issue but you can easily try this yourself by switching the encoding profile to baseline. e.g.
This fixes the out-of-order frame display for h264 but is thus restricted to 8 bit encoding. For hevc, I tried the hevc encoding profiles listed below but this does not change the temporal jitter issue.
Quicktime is the default video player on mac. I would like to be able to share videos without having to always caveat it by saying “please use vlc to play these”.
Also, quicktime has very nice frame-level seeking UX. I for example often use the right and left keyboard arrows to move by one frame and spot changes, etc.
I am not sure how to check what order quicktime is actually using to display the frames. The PTS information I was showing is the output of ffprobe, now also showing frame type:
It looks like the mp4 file is missing the ctts atom
meaning it isn’t aware of the presentation times and assumes the frames should be presented in decode order.
I am not sure how to get FFMpeg to include this information however I can suggest a quick fix which should work and a possible solution with a very very slim chance of success.
Quick Fix
Reduce the GOP so there are not B frames and therefore no need for a ctts atom. e.g.
encoder_params.gopLength = 2
Possible Solution
I have no idea if this will work because I can’t play hevc files in my outdated version of quicktime. That said its possible ffmpeg will introduce the ctts atom automaticaly if you change the file to mov? e.g.
I’m guessing you don’t want lossless compression but just incase this is also supported the two caveats being that the file is very large and I have no idea if it will play in quicktime. e.g.
Related to this question, I read through this ffmpeg ticket: https://trac.ffmpeg.org/ticket/502#comment:26
The suggested use of the dts2pts filter fixes the issue but is only available for h264:
On another note (with an apology for bringing up more requests), is there a way to specify the colour range to the container? As briefly mentioner in a previous comment (Status and usage of cudacodec::VideoWriter - #21 by tvercaut), I am interested in storing grayscale data in the Y plane. By default, ffmpeg seems to use a limited colour range, thus expecting 10 bit Y values to be in the range of [16; 235] * 2^2 = [64; 940] or [16; 235] * 2^2 * 2*6 = [4096; 60160] for nvenc (while keeping only the 10 most significant bits).
While it looks possible to feed full range data to nvenc but not tell it to ffmpeg, this may lead to counter-intuitive display issues. For the ffmpeg command line option, specifying the use of full range colours can be done with -color_range 2. The libav counterpart seems to be here: https://ffmpeg.org/doxygen/trunk/pixfmt_8h.html#a3da0bf691418bc22c4bcbe6583ad589a
Is it challenging to expose this flag in the writer?
I have uploaded a new wheel with corrected pts values, hopefully the resulting video’s will be playable in quicktime.
Correct as mentioned in the notebook the muxed containers have the full range flag set to false. This can be altered but I am not sure if it has general purpose value in OpenCV as the input range can be limited before encoding the YUV values if they are not in the standard video range.
Apologies I had missed the note about it in your notebook.
I am obviously biased but I would have thought that being able to record / playback video acquired with a machine vision camera (either 8 or 10 bit) fits quite well in the remit of opencv. There is alreay support for video capture for many such cameras. The full bit range would be utilised in such camera streams.
As you say, it is possible to squash the value range of the camera feed before writing it as a video file but this will lead to precision loss on top of any (lossy) compression that may occur. The alternative of recording full range but not telling it to the container is probably workable in many cases but, in addition to skewing the semantics a bit, I am not sure what side effects may occur due to using reserved values such as 0 (and 255/1023 in 8/10 bit respectively):
Values outside the nominal ranges are allowed, but typically they would be clamped for broadcast or for display (except for Superwhite and xvYCC). Values 0 and 255 are reserved as timing references (SAV and EAV), and may not contain color data (for 8 bits, for 10 bits more values are reserved and for 12 bits even more, no values are reserved in files or RGB mode or full range YCbCr digital modes like sYCC or opYCC).