Cost time "method detect of cv2.dnn_Detection Model "

merpharm · June 23, 2023, 12:34pm

Hello:
Why the method ‘detect of cv2.dnn_Detection’ consumes more time on the first frame. On quadro K1000 GPU I have more than 2.7s for the first frame and an average of 11ms for 213 frames processed. How it can be reduced?

cudawarped · June 23, 2023, 1:17pm

Are you using the CUDA backend?

merpharm · June 23, 2023, 2:53pm

YES? I USE OPENCV COMPILED FROM SOURCE ON CUDA.
This my code
net_Detector1 = cv2.dnn.readNet(Weights_file,CFG_file, “darknet”)
net_Detector1.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)
net_Detector1.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA)
model_Detector1 = cv2.dnn_DetectionModel(net_Detector1 )
model_Detector1.setInputParams(size=size_Detector, scale=1/255, swapRB=True)

cudawarped · June 23, 2023, 2:56pm

It depends on your code. If this is the first call to any CUDA function the delay is from the context creation, if not the delay is a result of the CUDA dnn libraries being loaded.

merpharm · June 23, 2023, 3:01pm

I use yolov4 and how can we say that it is real time if we have all this latency that is related to the first call and loading of libraries?

cudawarped · June 23, 2023, 3:03pm

I don’t see how initialization cost which is a one off and could for instance happen a year before you perform inference has anything to do with latency or realtime processing.

laurent.berger · June 23, 2023, 3:10pm

I think first iteration cost because data transfert (weight 246Mbyte ) between CPU and GPU
and may be code compilation too (as opencl)

LOOP 0
Tps cuda :2474.64ms
Tps opencv :401.108ms
[ WARN:0@9.264] global ocl4dnn_conv_spatial.cpp:1923 cv::dnn::ocl4dnn::OCL4DNNConvSpatial<float>::loadTunedConfig OpenCV(ocl4dnn): consider to specify kernel configuration cache directory through OPENCV_OCL4DNN_CONFIG_PATH parameter.
OpenCL program build log: dnn/dummy
Status -11: CL_BUILD_PROGRAM_FAILURE
-cl-no-subgroup-ifp
Error in processing command line: Don't understand command line argument "-cl-no-subgroup-ifp"!
Tps inference :654.125ms
Tps vino :1475.66ms
*******************
LOOP 1
Tps cuda :12.2015ms
Tps opencv :286.404ms
Tps inference :98.1013ms
Tps vino :162.483ms
*******************
LOOP 2
Tps cuda :13.2503ms
Tps opencv :285.352ms
Tps inference :98.0035ms
Tps vino :162.986ms

merpharm · June 23, 2023, 3:24pm

Initiation is done at the beginning of the code before inference. I’m talking about the time of the inference of the first frame.

merpharm · June 23, 2023, 3:36pm

Thank you for your reply. I understand that weights are transferred when calling the read function (cv2.dnn.readNet(Weights_file,CFG_file, “darknet”)). and the cv2.dnn_DetectionModel function allows to perform inference directly.

laurent.berger · June 23, 2023, 3:43pm

I don’t think so :
net_cuda= dnn::readNet("hed_deploy.prototxt", "hed_pretrained_bsds.caffemodel");

weight are in memory
then you select backend

    tpsInference.start();
    net_cuda.setPreferableTarget(DNN_TARGET_CUDA);
    net_cuda.setPreferableBackend(DNN_BACKEND_CUDA);
    tpsInference.stop();
    cout << "Tps  select cuda :" << tpsInference.getTimeMilli() << "ms" << endl;
    tpsInference.reset();

Result is
Tps select cuda :0.0019ms

cudawarped · June 23, 2023, 3:43pm

Then perform a single inference at the begining of the code on a blank frame.

Additionaly run cuda::setDevice(0) (0 if you only have one GPU) before calling any other cuda code to ensure the context is created before you initialize your network. Then you will know for sure if the delay is from context creation, dnn library loading or a host to device memory copy.

Topic		Replies	Views
Net.forward( ) is taking much time to compute while detecting faces dnn , cuda	11	580	February 15, 2021
Why I am not getting the same FPS for YuNet model Python dnn	0	49	March 8, 2025
How to detect objects with Cpp and DNN, CUDA C++ dnn , cuda , videoio	6	1938	May 26, 2021
Time inference difference between C++ and Python C++ dnn	1	390	September 6, 2023
Opencv + cuda + yolov5-v6.0 dnn , cuda , yolov5	3	873	December 31, 2021

Cost time "method detect of cv2.dnn_Detection Model "

Related topics