Why is net.forward() slow for YOLOv8 when used in C++?

I am converting my YOLOv8 model trained with Python to ONNX format. Since I don’t have a graphics card, I am testing it on my computer using CPU. I am using C++ with Visual Studio 2019 while working with Python in VS Code. On the C++ side, I am using the following code, but when measuring the performance of net.forward, I can process a 1280x720 pixel image in 2570 ms. When I process the same image on the Python side using model.predict, I obtain a time between 430-570 ms.

Where could the problem be?
PYTHON 450~ ms
model.predict('f’test.jpg)

C++

#include"yolov8.h" ~2500 ms

using namespace std;
using namespace cv;
using namespace cv::dnn;

bool Yolov8::ReadModel(Net& net, string& netPath, bool isCuda = false) {
	try {
		net = readNet(netPath);
#if CV_VERSION_MAJOR==4 &&CV_VERSION_MINOR==7&&CV_VERSION_REVISION==0
		net.enableWinograd(false);  //bug of opencv4.7.x in AVX only platform ,https://github.com/opencv/opencv/pull/23112 and https://github.com/opencv/opencv/issues/23080 
		//net.enableWinograd(true);		//If your CPU supports AVX2, you can set it true to speed up
#endif
	}
	catch (const std::exception&) {
		return false;
	}

	if (isCuda) {
		//cuda
		net.setPreferableBackend(cv::dnn::DNN_BACKEND_CUDA);
		net.setPreferableTarget(cv::dnn::DNN_TARGET_CUDA); //or DNN_TARGET_CUDA_FP16
	}
	else {
		//cpu
		cout << "Inference device: CPU" << endl;
		net.setPreferableBackend(cv::dnn::DNN_BACKEND_DEFAULT);
		net.setPreferableTarget(cv::dnn::DNN_TARGET_CPU);
	}
	return true;
}


bool Yolov8::Detect(Mat& srcImg, Net& net, vector<OutputSeg>& output,bool one_multi) {
	Mat blob;
	output.clear();
	int col = srcImg.cols;
	int row = srcImg.rows;
	Mat netInputImg;
	Vec4d params;
	try
	{
		LetterBox(srcImg, netInputImg, params, cv::Size(_netWidth, _netHeight));

	}
	catch (const std::exception&)
	{
		bool a = false;
		a = false;
		return false;
	}
	blobFromImage(netInputImg, blob, 1 / 255.0, cv::Size(_netWidth, _netHeight), cv::Scalar(0, 0, 0), true, false);
	net.setInput(blob);
	std::vector<cv::Mat> net_output_img;

	net.forward(net_output_img, net.getUnconnectedOutLayersNames()); //get outputs

The issue was due to running the program in debug mode. When switched to release mode, the problem was resolved.

:(chatgpt-3)
The problem occurred because running the program in debug mode can introduce additional overhead and performance limitations compared to running it in release mode. Debug mode typically includes features such as additional checks, logging, and other debugging tools that can slow down the execution of the program.

By switching to release mode, these additional checks and overhead are eliminated, resulting in improved performance and potentially resolving the problem that was encountered. Release mode is optimized for performance and is typically used when deploying the application in a production environment.

Therefore, changing the program execution mode from debug to release mode helped resolve the issue.

2 Likes