Increase performance of DNN model inference

Hello all,

I’ve Opencv 4.5.1 build with DNN-Cuda support. I’ve used the some official DarkNet models and confirmed that it runs on GPU.

I’ve this benchmark:

DEVICE: Time (ms): FPS: Model: Mode:
Laptop 8.80 113.63 Yolov7 – Darknet - OpenCV_DNN FP16
Orin-Jetson 71.77 13.93 Yolov7 – Darknet - OpenCV_DNN FP16
Orin-Jetson 192.97 5.18 Yolov7 – Darknet - OpenCV_DNN FP32
Orin-Jetson 31.14 32.11 Yolov4 – tkDNN FP16
Orin-Jetson 13.80 72.47 Yolov4-tiny – Darknet – OpenCV_DNN FP16
Nano-Jetson 480.28 2.08 Yolov7 – Darknet - OpenCV_DNN FP16

I had also tried TensorRT models for YoloV7 with same HW and get much better results.

My question is, how to make optimization better for CUDA to have a better inference time. Should I dig in model file conversion part of the code? Where to start for this? Any suggestion would be welcome.

I have tried many different options such as initiating the cv::dnn::Net object by “readNetFromDarknet” or “DetectionModel”. I had also tried converting the model to TensorRT which given the best performance, like 5x better. tkDNN gave 2.5x better performance. But I want to depend only on opencv-dnn with cuda as library.

Thanks in advance.