I suspect you are observing the CUDA not decode utilization.
VideoReader.grab()
does not work in the same way on the GPU. It is exactly the same as nextFrame()
except it doen’t return anything so the processing utilization is the same. Therefore you should avoid any logic like the below and use nextFrame()
instead.
for i in range(int((count1 + 1) * cap_fps / fps) - int(count1 * cap_fps / fps)):
if cap.get(cv2.CAP_PROP_POS_MSEC)[1] >= (count1 + 1) * 1000 - 18:
break
ret = cap.grab()
I’m not sure what hardware you are using but to fully saturate the decoder you may need to increase the number of decode surfaces. Try
params = cv2.cudacodec.VideoReaderInitParams()
params.minNumDecodeSurfaces=10
cap = cv2.cudacodec.createVideoReader(video_path, params=params)
The ideal number of surfaces depends on the workload and GPU and will require more memory on the GPU, see the below taken from the Nvidia Video Codec SDK docs
The following steps should be followed for optimizing video memory usage:
- Make
CUVIDDECODECREATEINFO::ulNumDecodeSurfaces = CUVIDEOFORMAT:: min_num_decode_surfaces
. This will ensure that the underlying driver allocates minimum number of decode surfaces to correctly decode the sequence. In case there is reduction in decoder performance, clients can slightly increaseCUVIDDECODECREATEINFO::ulNumDecodeSurfaces
. It is therefore recommended to choose the optimal value ofCUVIDDECODECREATEINFO::ulNumDecodeSurfaces
to ensure right balance between decoder throughput and memory consumption.CUVIDDECODECREATEINFO::ulNumOutputSurfaces
should be decided optimally after due experimentation for balancing decoder throughput and memory consumption.