Unable to interpret model output

I’m trying to use mediapipe palm detection model in a C++ project. i converted the tflite model to a pb one and i can load the model and get the output successfully.
the problem is i am unable to interpret the output to draw the detection rectangles.

my code :

 dnn::Net net = dnn::readNetFromTensorflow("palm_detection_builtin.pb");
    VideoCapture cam(0);
    cam.set(CAP_PROP_FPS,30);

    Mat frame ;
    while(cam.read(frame))
    {
        cvtColor(frame,frame,COLOR_BGR2RGB);

        Mat blob = dnn::blobFromImage(frame,1.0/255, cv::Size(256, 256), cv::Scalar(0, 0, 0),false,false);
        net.setInput(blob);
        Mat output = net.forward("regressors"); // shape is (1, 2944, 18)
        Mat classification_out = net.forward("classificators"); // shape is (1, 2944, 1)
    }

i found this:

which would mean, each row in the regressors output would be:

[4 box coords] [p0.x p0.y][p1.x p1.y][p2.x p2.y][p3.x p3.y][p4.x p4.y][p5.x p5.y][p6.x p6.y]

i think, the classificators output are probabilities for those box / keypoint proposals.

some pseudo code to parse it:

Mat reg = output.reshape(1, output.size[1]); //2d Mat, 2944 rows, 18 cols
Mat prob = classification_out.reshape(1, classification_out.size[1]); //2d Mat, 2944 rows, 1 col
float dw = float(image.cols) / 256;
float dh = float(image.rows) / 256;

vector<Rect> boxes;
for (int i=0; i<reg.rows; i++) {
     if (prob.at<float>(i,0) < some_threshold)
          continue;
    Mat_<float> row = reg.row(i);
    // scale to orig. image coords:
    Rect b(row(0,0) * dw, row(0,1)*dh, row(0,2)*dw, row(0,3)*dh);
    boxes.push_back(b);
}
// then apply dnn::NMSBoxes, to filter out dupes

however, we still dont know exactly, how the boxes look like !
(is it [x,y,w,h] ? [x1,y1,x2,y2] ? [cx, cy, w, h] ?)

maybe you can print out the first few boxes, so we can take a look at it here ?

the results of printing out the boxes is :

[2874 x 1617 from (269, 15)]
[2927 x 1646 from (278, 15)]
[2940 x 1654 from (270, 13)]
[2929 x 1648 from (275, 14)]
[2930 x 1648 from (270, 4)]
[2925 x 1645 from (260, 4)]
[2897 x 1629 from (221, -1)]
[2880 x 1620 from (238, 0)]
[2906 x 1635 from (258, 9)]
[2917 x 1641 from (261, 14)]
[2922 x 1643 from (270, 12)]
[2799 x 1574 from (261, -4)]
[2629 x 1478 from (92, 9)]
[2574 x 1448 from (-604, 1)]
[2686 x 1510 from (230, 11)]
[2782 x 1565 from (94, 36)]
[2565 x 1443 from (271, 2)]
[2612 x 1469 from (275, 24)]
[2928 x 1647 from (267, 10)]
[2893 x 1627 from (262, 15)]
[2878 x 1619 from (272, 20)]
[2929 x 1648 from (282, 22)]
[2944 x 1656 from (280, 20)]
[2949 x 1658 from (276, 17)]
[2905 x 1634 from (269, 3)]
[2908 x 1635 from (257, -4)]
[2898 x 1630 from (244, 0)]
[2850 x 1603 from (241, 5)]
[2864 x 1611 from (268, 10)]
[2926 x 1646 from (269, 16)]
[2926 x 1646 from (267, 12)]
[2802 x 1576 from (256, -4)]
[2628 x 1478 from (95, 9)]

code :

#include <iostream>
#include <opencv2/opencv.hpp>
#include <opencv2/dnn.hpp>

using namespace std;
using namespace cv;

int main()
{
    dnn::Net net = dnn::readNetFromTensorflow("palm_detection_builtin.pb");
    VideoCapture cam(0);

    Mat frame ;
    while(cam.read(frame))
    {

        cvtColor(frame,frame,COLOR_BGR2RGB);

        Mat blob = dnn::blobFromImage(frame,1.0/255, cv::Size(256, 256), cv::Scalar(0, 0, 0),false,false);
        net.setInput(blob);

        cv::Mat outputs, classificators_outs;
        net.forward(outputs,"regressors");
        net.forward(classificators_outs,"classificators");
        Mat reg = outputs.reshape(1, outputs.size[1]); //2d Mat, 2944 rows, 18 cols
        Mat prob = classificators_outs.reshape(1, classificators_outs.size[1]); //2d Mat, 2944 rows, 1 col
        float dw = float(frame.cols) / 256;
        float dh = float(frame.rows) / 256;

        vector<Rect> boxes;
        std::vector<float> confidences;

        for (int i=0; i<reg.rows; i++) {
             if (prob.at<float>(i,0) < 0.5)
                  continue;
            Mat_<float> row = reg.row(i);
            // scale to orig. image coords:
            Rect b(row(0,0) * dw, row(0,1)*dh, row(0,2)*dw, row(0,3)*dh);
            boxes.push_back(b);
            cout << b << endl;
        }

    }

    return 0;
}

looks far too huge to me …
how large is the image ? scaling factor might be off
or maybe look at the unscaled row data ?

I added a resize in the begining of the code :

    resize(frame,frame,Size(1280,960));

and the results are the same. I noticed that if I let the code run further more, the rectangles get bigger and bigger:

[13129 x 9847 from (-552, -668)]
[12608 x 9456 from (214, -600)]
[12237 x 9177 from (-348, -521)]
[11970 x 8977 from (-71, -516)]
1 Like

why so ? the network input gets resized to (256,256) anyway in the blob
(please also note, that it’s (192,192) in the pbtxt file)

how does the raw row data look ?
(again, i dont have that moel, and can only guess here …)

you can find the model in this link
visualizing the model in netron gives this result :
input :

output :

I didn’t understand. what is the raw row data ?
thanks for your help

Mat_<float> row = reg.row(i);
cout << row << endl;

outputs :

[3.6905072, 5.4042335, 32.863857, 32.863861, 6.3343191, 5.0829744, 7.8449326, 4.7407198, -0.25462124, 1.8523651, -6.654418, 0.3679803, -11.397835, 0.078130387, 13.685674, 8.8667402, 15.410878, 11.636487]
1 Like

related: