When deploying the same code on a C++ OpenCV DNN CPU, does version 4.11.0 require twice as much execution time as version 4.5.3?

Cen · July 31, 2025, 6:39am

OpenCV version 4.11.0 was reported to include optimizations for the performance of the `cv::dnn::blobFromImage()` function. Based on this, I intended to improve the runtime efficiency by upgrading from the original version 4.5.3 library without modifying the source code. However, after conducting comparative tests, I observed that the overall execution time had doubled. Further investigation revealed that the increase in time consumption primarily occurred during the `model_.forward()` function call. Is there any additional configuration or code adjustment required when using version 4.11.0 to achieve the expected performance improvements?

crackwitz · July 31, 2025, 11:25pm

is it the same model and weights for both cases?

what does the CPU usage look like, i.e. how many cores are actually loaded as the program runs? does it look the same as before?

if you could compare the output of cv::getBuildInformation() for both versions, that’d be useful.

Cen · August 1, 2025, 2:39am

Thank you for your reply.

I checked the compilation of the two versions. Essentially, there is not much difference. To avoid the problem of source code differences in my own cmake compilation, I downloaded and installed the Opencv4.11.0 and Opencv4.5.3 exe files from the official website. When I called them, I found that there was a significant difference in CPU DNN inference time. I tested the CPU efficiency using the intel VTune Profiler tool, and the comparison results were similar.

I tried the OpenCV 4.10.0 version, but the inference speed was quite different from that of OpenCV 4.5.3.

cv::dnn::Net model_ = cv::dnn::readNetFromONNX(buffer, length);
if (is_Gpu)
{
model_.setPreferableBackend(cv::dnn::DNN_BACKEND_CUDA);
model_.setPreferableTarget(cv::dnn::DNN_TARGET_CUDA);
}
else
{
model_.setPreferableBackend(cv::dnn::DNN_BACKEND_DEFAULT);
model_.setPreferableTarget(cv::dnn::DNN_TARGET_CPU);
}

cv::dnn::blobFromImage(img, blob, 1. / 255., cv::Size(INPUT_WIDTH, INPUT_HEIGHT), cv::Scalar(), true, false);
std::vectorcv::Matdetections;
model_.setInput(blob);
model_.forward(detections, model_.getUnconnectedOutLayersNames());

crackwitz · August 1, 2025, 1:01pm

that VTune Profiler output is good data.

you might want to submit an issue on OpenCV’s github.

Topic		Replies	Views
Time inference difference between C++ and Python C++ dnn	1	401	September 6, 2023
DNN GPU Broken CUDA Issues - Pls help Python dnn , build , cuda	14	3395	February 4, 2021
Speed improvements by using OpenVINO C++ dnn , build	0	477	October 11, 2022
Open CV dnn module python and c++ outputs are different C++ dnn	12	1511	March 31, 2021
Any performance difference between 4.55 vs 4.6? performance	5	821	July 19, 2022

When deploying the same code on a C++ OpenCV DNN CPU, does version 4.11.0 require twice as much execution time as version 4.5.3?

Related topics