OpenCV Optical Flow Cuda Naiva Implementation Slower then CPU

Thank you, @cudawarped. Your resources on CUDA optimization in OpenCV are invaluable.

Regarding your answer, I have the following points of confusion that I still don’t understand:

  1. Sparse Optical Flow CPU vs. GPU Performance:
    Your comparison of sparse optical flow on CPU vs. GPU showed a 56% speed up on the GPU using the same naive implementation. Even if my rig is not the same, the performance shouldn’t reverse to a 40% decrease, should it?

  2. Comparison with Background Subtraction Test:
    | That said I have no idea if the code will be faster on your RTX 3070 than your Ryzon 7 2700.
    As I mentioned, I tested your optimization repository’s naive CPU vs. GPU comparison for background subtraction, and it gave me an 11x speed boost using the following code:

bgmog2_device = cv.cuda.createBackgroundSubtractorMOG2()
def ProcFrameCuda0(frame, lr, store_res=False):
    frame_device.upload(frame)
    frame_device_big = cv.cuda.resize(frame_device, (cols_big, rows_big))
    fg_device_big = bgmog2_device.apply(frame_device_big, lr, cv.cuda.Stream_Null())
    fg_device = cv.cuda.resize(fg_device_big, frame_device.size())
    fg_host = fg_device.download()
    if(store_res):
        gpu_res.append(np.copy(fg_host))

gpu_res = []
gpu_time_0, n_frames = ProcVid0(partial(ProcFrameCuda0, store_res=check_res), lr)
print(f'GPU 0 (naive): {n_frames} frames, {gpu_time_0:.2f} ms/frame')
print(f'Speedup over CPU: {cpu_time_0/gpu_time_0:.2f}')

Could the size of the image/frame that we are processing be the cause? In the optical flow case, the test video was 640x360. And in background subtraction, the test image is 1440x2560 (after-resizing).

Also, the OpenCV version you were using in Sparse Optical Flow is 4.1, and I have 4.8. Maybe something changed in the versions that is causing this?

My idea was to get a naive implementation working as it should and then investigate optimizations. If that makes sense. :innocent: