SURF_CUDA performance

How are you calculating this?

I ran a quick test (Mobile RTX 3070 Ti vs i7-12700H) with 20 identical images (opencv_extra/testdata/gpu/features2d/aloe.png) timing only the execution of SURF and found the GPU was significantly faster (~5ms vs ~500ms per image).