Thank you, @cudawarped. Your resources on CUDA optimization in OpenCV are invaluable.
Regarding your answer, I have the following points of confusion that I still don’t understand:
-
Sparse Optical Flow CPU vs. GPU Performance:
Your comparison of sparse optical flow on CPU vs. GPU showed a 56% speed up on the GPU using the same naive implementation. Even if my rig is not the same, the performance shouldn’t reverse to a 40% decrease, should it? -
Comparison with Background Subtraction Test:
| That said I have no idea if the code will be faster on your RTX 3070 than your Ryzon 7 2700.
As I mentioned, I tested your optimization repository’s naive CPU vs. GPU comparison for background subtraction, and it gave me an 11x speed boost using the following code:
bgmog2_device = cv.cuda.createBackgroundSubtractorMOG2()
def ProcFrameCuda0(frame, lr, store_res=False):
frame_device.upload(frame)
frame_device_big = cv.cuda.resize(frame_device, (cols_big, rows_big))
fg_device_big = bgmog2_device.apply(frame_device_big, lr, cv.cuda.Stream_Null())
fg_device = cv.cuda.resize(fg_device_big, frame_device.size())
fg_host = fg_device.download()
if(store_res):
gpu_res.append(np.copy(fg_host))
gpu_res = []
gpu_time_0, n_frames = ProcVid0(partial(ProcFrameCuda0, store_res=check_res), lr)
print(f'GPU 0 (naive): {n_frames} frames, {gpu_time_0:.2f} ms/frame')
print(f'Speedup over CPU: {cpu_time_0/gpu_time_0:.2f}')
Could the size of the image/frame that we are processing be the cause? In the optical flow case, the test video was 640x360. And in background subtraction, the test image is 1440x2560 (after-resizing).
Also, the OpenCV version you were using in Sparse Optical Flow is 4.1, and I have 4.8. Maybe something changed in the versions that is causing this?
My idea was to get a naive implementation working as it should and then investigate optimizations. If that makes sense.