I’m using PyTorch on GPU, and trying to read each frame, and let’s say, see if there’s a cat in the frame. I want to extract all frames of cats, and create a new video with only those frames.
My videos are in 1080p (1920x1080).
To explain my thought process so you can see what the bottleneck is, (hopefully you’re familiar with PyTorch):
OpenCV CUDA loads frames into PyTorch DataLoader (which I’ll set num_workers=4
and pinned_memory=True
), then DataLoader sends frames to model.
I sort of see what you mean in your pinned memory and streams explanation. Hopefully PyTorch’s DataLoader can take care of that.
I’m completely lost on the OpenCV CUDA code and how to patch OpenCV CUDA using the AV1 codec to work with the Video Codec SDK 11.0
Note: I’d like to do everything with GPU to keep things future-proof, since I’m unsure if I’ll always use an Intel CPU.
Overall, do you think my pipeline would work out?
I managed to do some testing, the reading part takes ~3h (without the DataLoader), and the writing ~1h.