How to detect objects with Cpp and DNN, CUDA

crackwitz · May 26, 2021, 5:13pm

a GTX 970 has ~4 Tflop/s of conventional FP32, which is about half of what an RTX 2070 can do, but that ignores tensor cores. the 20 series has tensor cores, the 9 series does not. tensor cores are the performance factor that accelerate convolutional layers by at least an order of magnitude.

do follow that link on mobile/tiny variants of these networks.

make sure to pick the CUDA backend, not the generic OpenCL one.

someone named Yashas Samaga implemented the CUDA backend for the dnn module. he can also be seen battling with Adrian Rosebrock over proper benchmarking methodology on Adrian’s notorious code blog. I won’t link to that but suffice it to say, I chose my words carefully.

Topic		Replies	Views
setUpNet DNN module was not built with CUDA backend; switching to CPU C++ dnn , build , cuda	2	1727	November 21, 2022
Opencv + cuda + yolov5-v6.0 dnn , cuda , yolov5	3	879	December 31, 2021
OpenCV and deep learning neural networks C++ dnn	9	2443	August 29, 2021
OpenCV C++ and Yolo v5 C++ dnn , object-detection , yolov5	5	12843	January 7, 2022
YOLO3 Detection speed slow down (Opencv 4.5.3) C++ dnn , cuda	4	982	October 8, 2021

How to detect objects with Cpp and DNN, CUDA

Related topics