Take a look at this question too: https://forum.opencv.org/t/opencv-cuda-extremely-slow
Basically, the GPU startup takes some time. For me, the processing of the first image took 20-30x more time as the next images.
I suggest to test several images in a loop and measure the processing time for each image, and see if there’s an improvement.