When deploying the same code on a C++ OpenCV DNN CPU, does version 4.11.0 require twice as much execution time as version 4.5.3?

OpenCV version 4.11.0 was reported to include optimizations for the performance of the `cv::dnn::blobFromImage()` function. Based on this, I intended to improve the runtime efficiency by upgrading from the original version 4.5.3 library without modifying the source code. However, after conducting comparative tests, I observed that the overall execution time had doubled. Further investigation revealed that the increase in time consumption primarily occurred during the `model_.forward()` function call. Is there any additional configuration or code adjustment required when using version 4.11.0 to achieve the expected performance improvements?

is it the same model and weights for both cases?

what does the CPU usage look like, i.e. how many cores are actually loaded as the program runs? does it look the same as before?

if you could compare the output of cv::getBuildInformation() for both versions, that’d be useful.

Thank you for your reply.

I checked the compilation of the two versions. Essentially, there is not much difference. To avoid the problem of source code differences in my own cmake compilation, I downloaded and installed the Opencv4.11.0 and Opencv4.5.3 exe files from the official website. When I called them, I found that there was a significant difference in CPU DNN inference time. I tested the CPU efficiency using the intel VTune Profiler tool, and the comparison results were similar.

I tried the OpenCV 4.10.0 version, but the inference speed was quite different from that of OpenCV 4.5.3.

cv::dnn::Net model_ = cv::dnn::readNetFromONNX(buffer, length);
if (is_Gpu)
{
model_.setPreferableBackend(cv::dnn::DNN_BACKEND_CUDA);
model_.setPreferableTarget(cv::dnn::DNN_TARGET_CUDA);
}
else
{
model_.setPreferableBackend(cv::dnn::DNN_BACKEND_DEFAULT);
model_.setPreferableTarget(cv::dnn::DNN_TARGET_CPU);
}

cv::dnn::blobFromImage(img, blob, 1. / 255., cv::Size(INPUT_WIDTH, INPUT_HEIGHT), cv::Scalar(), true, false);
std::vectorcv::Matdetections;
model_.setInput(blob);
model_.forward(detections, model_.getUnconnectedOutLayersNames());

that VTune Profiler output is good data.

you might want to submit an issue on OpenCV’s github.