Hello all,
I’ve Opencv 4.5.1 build with DNN-Cuda support. I’ve used the some official DarkNet models and confirmed that it runs on GPU.
I’ve this benchmark:
DEVICE: | Time (ms): | FPS: | Model: | Mode: |
---|---|---|---|---|
Laptop | 8.80 | 113.63 | Yolov7 – Darknet - OpenCV_DNN | FP16 |
Orin-Jetson | 71.77 | 13.93 | Yolov7 – Darknet - OpenCV_DNN | FP16 |
Orin-Jetson | 192.97 | 5.18 | Yolov7 – Darknet - OpenCV_DNN | FP32 |
Orin-Jetson | 31.14 | 32.11 | Yolov4 – tkDNN | FP16 |
Orin-Jetson | 13.80 | 72.47 | Yolov4-tiny – Darknet – OpenCV_DNN | FP16 |
Nano-Jetson | 480.28 | 2.08 | Yolov7 – Darknet - OpenCV_DNN | FP16 |
I had also tried TensorRT models for YoloV7 with same HW and get much better results.
My question is, how to make optimization better for CUDA to have a better inference time. Should I dig in model file conversion part of the code? Where to start for this? Any suggestion would be welcome.
I have tried many different options such as initiating the cv::dnn::Net object by “readNetFromDarknet” or “DetectionModel”. I had also tried converting the model to TensorRT which given the best performance, like 5x better. tkDNN gave 2.5x better performance. But I want to depend only on opencv-dnn with cuda as library.
Thanks in advance.