I am converting my YOLOv8 model trained with Python to ONNX format. Since I don’t have a graphics card, I am testing it on my computer using CPU. I am using C++ with Visual Studio 2019 while working with Python in VS Code. On the C++ side, I am using the following code, but when measuring the performance of net.forward, I can process a 1280x720 pixel image in 2570 ms. When I process the same image on the Python side using model.predict, I obtain a time between 430-570 ms.
Where could the problem be?
PYTHON 450~ ms
#include"yolov8.h" ~2500 ms
using namespace std;
using namespace cv;
using namespace cv::dnn;
bool Yolov8::ReadModel(Net& net, string& netPath, bool isCuda = false) {
try {
net = readNet(netPath);
net.enableWinograd(false); //bug of opencv4.7.x in AVX only platform ,https://github.com/opencv/opencv/pull/23112 and https://github.com/opencv/opencv/issues/23080
//net.enableWinograd(true); //If your CPU supports AVX2, you can set it true to speed up
catch (const std::exception&) {
return false;
if (isCuda) {
net.setPreferableTarget(cv::dnn::DNN_TARGET_CUDA); //or DNN_TARGET_CUDA_FP16
else {
cout << "Inference device: CPU" << endl;
return true;
bool Yolov8::Detect(Mat& srcImg, Net& net, vector<OutputSeg>& output,bool one_multi) {
Mat blob;
int col = srcImg.cols;
int row = srcImg.rows;
Mat netInputImg;
Vec4d params;
LetterBox(srcImg, netInputImg, params, cv::Size(_netWidth, _netHeight));
catch (const std::exception&)
bool a = false;
a = false;
return false;
blobFromImage(netInputImg, blob, 1 / 255.0, cv::Size(_netWidth, _netHeight), cv::Scalar(0, 0, 0), true, false);
std::vector<cv::Mat> net_output_img;
net.forward(net_output_img, net.getUnconnectedOutLayersNames()); //get outputs