Reading and Writing Videos: Python on GPU with CUDA - VideoCapture and VideoWriter

I should have mentioned that this is currently going to be inefficient because you can’t pass a GpuMat into pytorch. As a result you would have to download from the device to the host each GpuMat containing the decoded frame of your video and then pass the resulting numpy array to pytorch where you would have to upload it from the device to the host again. In your original post I assumed the processing would all happen in OpenCV. As you are passing the decoded frames to pytorch you will probably find it more efficient to decode on the CPU, unless the DataLoader can consume faster than your CPU can decode.

Currently in OpenCV the DNN module only takes Mat input but there is a plan to enable it to take GpuMat as well. If/when this is implemented you could export your pytorch model to ONNX so it can be used in OpenCV, then have your whole pipeline in OpenCV.