Unexpected behavior with ONNX model when read from OpenCV DNN module (Java)

I have a YOLO model extracted in ONNX format and I am trying to use it for inference in Java. I need to get the bounding box and class prediction information.
While the model is being read with no errors, I am getting that the rows and cols are -1. I would appreciate it if anyone could help with the issue.

Mat img = Imgcodecs.imread("x0.png");
    	Net net = Dnn.readNetFromONNX("best.onnx");
        Mat blob = Dnn.blobFromImage(img, 1/255.0, new Size(224,224), new Scalar(0,0,0),true, false);

        // Set the input to the network
        net.setInput(blob);

        
        List<Mat> outputs = new ArrayList<Mat>();
        net.forward(outputs, net.getUnconnectedOutLayersNames());
        
        System.out.println("Number of output layers: " + outputs.size());

        for (Mat output : outputs) {
            
            int rows = output.rows();
            int cols = output.cols();

            System.out.println("Rows in output: " + rows);
            System.out.println("Columns in output: " + cols);
            
            if (rows <= 0) {
                System.out.println("Empty output, skipping.");
                continue; // Skip empty outputs
            }

            for (int i = 0; i < rows; i++) {
                Mat row = output.row(i);
                if (cols < 5) {
                    System.out.println("Insufficient columns in row, skipping.");
                    continue; // Skip invalid rows
                }

                // Extract bounding box and confidence
                double[] data = new double[cols];
                row.get(0, 0, data);

                double confidence = data[4];
                if (confidence < 0.5) {
                    System.out.println("Low confidence (" + confidence + "), skipping.");
                    continue; // Skip low-confidence detections
                }

                double centerX = data[0];
                double centerY = data[1];
                double width = data[2];
                double height = data[3];

                int absCenterX = (int) (centerX * img.width());
                int absCenterY = (int) (centerY * img.height());
                int absWidth = (int) (width * img.width());
                int absHeight = (int) (height * img.height());

                // Compute top-left corner
                int left = absCenterX - absWidth / 2;
                int top = absCenterY - absHeight / 2;

                // Extract class scores
                double maxScore = -1;
                int classId = -1;
                for (int j = 5; j < cols; j++) {
                    double score = data[j];
                    if (score > maxScore) {
                        maxScore = score;
                        classId = j - 5; // Assuming class labels start from column 5
                    }
                }

                // Print or store the bounding box and class
                Rect boundingBox = new Rect(left, top, absWidth, absHeight);
                System.out.println("Bounding Box: " + boundingBox);
                System.out.println("Class ID: " + classId);
        }
    }

which, exactly (there are about 10 different) ?
what are you trying to infer from it ?
(classes ? segments ? boxes ?)
opencv version ?

that’s expected / on purpose.
rows/cols only make sense with 2d Mats, you got a 4d tensor here, use:

output.dims() // -> 4 ?
output.size(0); // 1,2,3

to retrieve actual dimensions

Thank you for the reply. Im using YOLOv8s.
output.dims() returns 3.
output.size() returns 13x1.
I really can’t find proper documentation as to how to extract the bounding boxes…

you need to index the size() function here, like size(2)

I tried using size(0) instead of rows and size(1) instead of cols but Im still having issues with the code.

maybe have a look here
in the end, i’d expect to see 8400 rows of box proposals (cx,cy,w,h, p0,p1,…pn) , which you still have to filter (NMS)

This is the only resource I found for YOLOv8 using OpenCV. While it has some problems, I managed to make it work. Apparently it took the person a month to come up with the code due to the lack of documentation of Java, disappointing really.

1 Like

well, so, – if you could post working java code here –
much appreciated !

You Can Try this example:

struct BestResult {
    int bestId;
    float bestScore;
};

BestResult getBestFromConfidenceValue(float confidenceValues[], size_t size) {
    BestResult result;
    result.bestId = -1; 
    result.bestScore = 0.0f;
    for (size_t i = 0; i < size; ++i) {
        if (confidenceValues[i] > result.bestScore) {
            result.bestId = static_cast<int>(i);
            result.bestScore = confidenceValues[i];
        }
    }
    return result;
}

void postprocess(cv::Mat& frame, const std::vector<cv::Mat>& outs, float confThreshold, float nmsThreshold) {

    std::vector<int> classIds;
    std::vector<float> confidences;
    std::vector<cv::Rect> boxes;
    int columns = 84;
    int rows = 8400;
    for (const auto& out : outs) {
        float* data_ptr = (float*)out.data;
        std::cout <<"out.rows: "<< out.size << std::endl;
        for (int i = 0; i < rows; ++i) {
            auto x = (data_ptr[i+rows*0]);
            auto y = (data_ptr[i+rows*1]);
            auto w = (data_ptr[i+rows*2]);
            auto h = (data_ptr[i+rows*3]);
            float confidenceValues[80] ={};
            for (int j = 4; j < columns; ++j) {
                confidenceValues[j-4]=data_ptr[i+rows*j];
            }
            BestResult result = getBestFromConfidenceValue(confidenceValues, 80);
            classIds.push_back(result.bestId);
            confidences.push_back(result.bestScore);
            boxes.push_back(cv::Rect(int(x-w/2), int(y-h/2) , w, h));
            std::cout <<"x="<<x <<", y="<<y <<", w="<<w <<", h="<<h << std::endl;
        }
    }