OpenCV Optical Flow Cuda Naiva Implementation Slower then CPU

cudawarped · April 3, 2024, 4:53pm

The naiive implementation (Naive CUDA implementation without pre-alloc, streams or other optimizations)

allocates the return arrays on the GPU in each iteration which is costly, and
calls cudaDeviceSynchronize (hard sync) on every iteration because you are not passing a cuda stream. As a result the timing will be off if you timed it with the code from that notebook which uses CPU not GPU timers. This due to the synchronization will include the latency (time between calling optFlow.calc and it execution) of every kernel launch.

That said I have no idea if the code will be faster on your RTX 3070 than your Ryzon 7 2700.

Topic		Replies	Views
OpenCV CUDA extremely slow cuda	3	6707	April 30, 2021
Help with optimization opencv videocapture optical flow analysis Python	9	945	April 17, 2024
CUDA: SIFT or SURF, disappointed by execution timings cuda	6	3509	December 29, 2022
Reading Video Signal with CPU vs GPU gpu , cuda , videoio , cudacodec	1	3541	July 22, 2022
CUDA Fast detector much slower than normal FAST performance , cuda , practical	9	2445	May 28, 2021