YOLO3 Detection speed slow down (Opencv 4.5.3)

Hello!
I am trying to use YOLO3 object detection with Opencv 4.5.3 with Ndivida backend. When I run my program, after a few seconds the detection speed slow down. The detection time at the start is about 40-50 msec, but after few secound the detection speed increasing up to 400-500 msec.
Anyone has any idea what caused this speed problem?

I tried to change back the Net backend to the CPU and then the detection speed is constatntly run about 300-400 msec.

Here are below the code snipet, what I used:
String modelConfiguration = “yolov3.cfg”;
String modelWeights = “yolov3.weights”;
Net net2 = readNetFromDarknet(modelConfiguration, modelWeights);

net2.setPreferableBackend(DNN_BACKEND_CUDA);
net2.setPreferableTarget(DNN_TARGET_CUDA);
TickMeter tickMeter2;
Mat blob;
for (int count = 0; ; ++count)
{
    tickMeter2.reset();
    tickMeter2.start();

    cap >> image;
    if (image.empty())
    {
        std::cerr << "Can't capture frame " << count << ". End of video stream?" << std::endl;
        break;
    }

    Mat render_image = image.clone();
    vector<Mat> outs;
    if (enable_yolo)
    {
        blobFromImage(image, blob, 1 / 255.0, cv::Size(inpWidth, inpHeight), Scalar(0, 0, 0), true, false);
        net2.setInput(blob);
        net2.forward(outs, getOutputsNames(net2));
    }

    if (enable_yolo)
    {
        postprocess(render_image, outs);
    }
    
    tickMeter2.stop();
    std::string timeLabel = format("Inference time: %.2f ms", tickMeter2.getTimeMilli());
    putText(render_image, timeLabel, Point(0, 40), FONT_HERSHEY_SIMPLEX, 1, Scalar(255, 0, 255), 2);

    imshow(winName, render_image);

    int c = waitKey(10);
}

I’ve made another experimental with TensorFlow model and using CUDA, and I got a similar result.
I’ve used the following code for testing:

If I run the program with CUDA backend and CUDA target, after a few seconds the network speed slow down dramatically.

My runing enviroment are the follows:
Windows10
Intel i7-1165G7
Nvidia Geforce MX350

Anyone has any idea what caused this speed problem?

I’ve made a number of experiment about this slow down problem, but I didn’t find right solution for my problem. I’ve tested this code on another machine. On this enviroment the program ran well without slow down, but this machine is weaker then my developer enviroment (which I wrote earlier).
From my experiment I could not decide what cause the detection speed slow down behavior.

I briefly summarize the differences between the two systems:

  • developer enviroment: Intel i7-1165G7, Nvidia Geforce MX350 - detection speed: 18-10-5 FPS
  • another enviroment: Intel i5-7200U, Nvidia Geforce 940MX - detection speed: 20-25 FPS

In the older environment, I found that only one program uses the GPU. However, in my development environment, where I experienced a decrease in speed, there were several programs using the GPU at the same time.
I tried to change the graphics acceleration settings in the operating system, but to no avail.

Someone might be able to give guidance on which direction to look for a solution to the problem.

@sgd7 , this is not my forte, but having no reply, here is mine.

You provided valuable info. It would be nice having a reply from someone with experience in the same problem, but this is not the case for now.

As you may thought, it seems the problem is not in the CV part, but in you system’s GPU workload. I believe any tool for visualizing GPU load will help. NVidia has many tools for that, but each of them requires learning how to use it.

Thanks for the guidance! Now I profiling my program with Nvidia tool, which you advice to me. Based on my current tests, I believe the problem is related to the power management settings of the operating system.

1 Like