CUDA: SIFT or SURF, disappointed by execution timings

Looking back at SURF_CUDA performance, the GPU performance of SURF is not that great and is heavily dependant on the image and its size. In the end I was seeing ~14.8ms per 1280x1180 frame on an RTX 3070 Ti.

Now texture references have been removed (Fix CUDA texture bugs and replace all instances of CUDA texture references with texture objects by cudawarped · Pull Request #3378 · opencv/opencv_contrib · GitHub) it may be possible for the performance of SURF to be improved by taking advantage of CUDA streams but that would probably not be an easy fix.