How to detect objects with Cpp and DNN, CUDA

a GTX 970 has ~4 Tflop/s of conventional FP32, which is about half of what an RTX 2070 can do, but that ignores tensor cores. the 20 series has tensor cores, the 9 series does not. tensor cores are the performance factor that accelerate convolutional layers by at least an order of magnitude.

do follow that link on mobile/tiny variants of these networks.

make sure to pick the CUDA backend, not the generic OpenCL one.

someone named Yashas Samaga implemented the CUDA backend for the dnn module. he can also be seen battling with Adrian Rosebrock over proper benchmarking methodology on Adrian’s notorious code blog. I won’t link to that but suffice it to say, I chose my words carefully.