I am using the GPU version of OpenCV to decode video, with the code as follows. The subsequent neural network recognition has not yet been added. Since I do not need frame-by-frame recognition, there is an FPS parameter in the method, which represents the number of image frames per second required for recognition in the video. If not specified, it defaults to the video frame rate. Typically, I set it to 1, meaning that recognizing one image per second is sufficient.
While running the decoding, the GPU usage is only at 3%. I doubt that multithreading would be reliable in this context. I’m wondering if there is a way to fully utilize the GPU. However, even with the current efficiency, if the grab function could be effectively utilized, it would meet my requirements.
The video lasts a total of 300 seconds = 7500 frames.
When I do not set the FPS, the video is decoded frame by frame, and the efficiency is ideal (GPU memory usage is 194M, utilization is 3%, CPU usage is less than 1 core; when run on the CPU, it uses almost all of the 32 cores):
count1= 7499
GPU execution time: 9.850459575653076
count1= 7500
CPU execution time: 14.664050340652466
However, when I set the FPS to 5, the CPU efficiency improves significantly because only one frame is decoded every 5 frames, but the GPU efficiency barely changes.
count1= 1499
GPU execution time: 9.862346649169922
count1= 1500
CPU execution time: 4.066887140274048
When the FPS is set to 1 (i.e., only one frame is decoded for every 25 frames per second in the video), the CPU decoding efficiency further improves, while the GPU remains unchanged.
count1= 300
GPU execution time: 9.844478368759155
count1= 300
CPU execution time: 2.5899405479431152
Here is the code snippet:
import cv2
import numpy as np
import time
# Read video using CPU
def read_video_cpu(video_path, fps=None):
cap = cv2.VideoCapture(video_path)
cap_fps = cap.get(cv2.CAP_PROP_FPS)
if fps is None:
fps = cap_fps
count1 = 0
ret = cap.grab()
was_read, img = cap.retrieve()
while True:
for i in range(int((count1 + 1) * cap_fps / fps) - int(count1 * cap_fps / fps)):
if cap.get(cv2.CAP_PROP_POS_MSEC) >= (count1 + 1) * 1000 - 18:
break
cap.grab()
ret, img = cap.retrieve()
count1 += 1
if not ret or img is None:
break
print("count1=", count1)
cap.release()
# Read video using GPU
def read_video_gpu(video_path, fps=None):
if not cv2.cuda.getCudaEnabledDeviceCount():
print("CUDA is not available. Please make sure CUDA drivers are installed.")
return
cap = cv2.cudacodec.createVideoReader(video_path)
cap_fps = cap.get(cv2.CAP_PROP_FPS)[1]
if fps is None:
fps = cap_fps
count1 = 0
ret = cap.grab()
was_read, img = cap.retrieve()
while True:
for i in range(int((count1 + 1) * cap_fps / fps) - int(count1 * cap_fps / fps)):
if cap.get(cv2.CAP_PROP_POS_MSEC)[1] >= (count1 + 1) * 1000 - 18:
break
ret = cap.grab()
if not ret:
break
ret, img = cap.retrieve()
count1 += 1
if not ret or img is None:
break
print("count1=", count1)
# cap.release()
video_path = "/root/1.mp4"
fps = None
time1 = time.time()
read_video_gpu(video_path,fps)
print('GPU execution time:', time.time() - time1)
time1 = time.time()
read_video_cpu(video_path,fps)
print('CPU execution time:', time.time() - time1)