The use of OpenCV's cuda-VideoReader's grab does not contribute to efficiency improvement?

I suspect you are observing the CUDA not decode utilization.


VideoReader.grab() does not work in the same way on the GPU. It is exactly the same as nextFrame() except it doen’t return anything so the processing utilization is the same. Therefore you should avoid any logic like the below and use nextFrame() instead.

for i in range(int((count1 + 1) * cap_fps / fps) - int(count1 * cap_fps / fps)):
    if cap.get(cv2.CAP_PROP_POS_MSEC)[1] >= (count1 + 1) * 1000 - 18:
        break
    ret = cap.grab()

I’m not sure what hardware you are using but to fully saturate the decoder you may need to increase the number of decode surfaces. Try

params = cv2.cudacodec.VideoReaderInitParams()
params.minNumDecodeSurfaces=10
cap = cv2.cudacodec.createVideoReader(video_path, params=params)

The ideal number of surfaces depends on the workload and GPU and will require more memory on the GPU, see the below taken from the Nvidia Video Codec SDK docs

The following steps should be followed for optimizing video memory usage:

  1. Make CUVIDDECODECREATEINFO::ulNumDecodeSurfaces = CUVIDEOFORMAT:: min_num_decode_surfaces. This will ensure that the underlying driver allocates minimum number of decode surfaces to correctly decode the sequence. In case there is reduction in decoder performance, clients can slightly increase CUVIDDECODECREATEINFO::ulNumDecodeSurfaces. It is therefore recommended to choose the optimal value of CUVIDDECODECREATEINFO::ulNumDecodeSurfaces to ensure right balance between decoder throughput and memory consumption.
  2. CUVIDDECODECREATEINFO::ulNumOutputSurfaces should be decided optimally after due experimentation for balancing decoder throughput and memory consumption.