Use Dnn.readNetFromModelOptimizer to detect objects

wureka · February 4, 2021, 12:30pm

I know how to use Dnn.readNetFromDarknet() to detect objects and find their bounding boxes.
Now I want to use Dnn.readNetFromModelOptimizer () to do the same thing. So
I download yolo-v3-tiny-tf from intel open model zoo and convert to OpenVINO IR files.

I read yolo-v3-tiny-tf for understanding the definition of output elements. Based on its description, the converted output is as below:

Converted model

The array of detection summary info, name - conv2d_9/BiasAdd/YoloRegion , shape - 1,255,13,13 . The anchor values are 81,82, 135,169, 344,319 .
The array of detection summary info, name - conv2d_12/BiasAdd/YoloRegion , shape - 1,255,26,26 . The anchor values are 23,27, 37,58, 81,82 .

For each case format is B,N*85,Cx,Cy , where

B - batch size
N - number of detection boxes for cell
Cx , Cy - cell index

Detection box has format [ x , y , h , w , box_score , class_no_1 , …, class_no_80 ], where:

( x , y ) - coordinates of box center relative to the cell
h , w - raw height and width of box, apply exponential function and multiply by corresponding anchors to get absolute height and width values
box_score - confidence of detection box in [0,1] range
class_no_1 ,…, class_no_80 - probability distribution over the classes in the [0,1] range, multiply by confidence value to get confidence of each class

So my code is below:
Functions:

public static String getShape(Mat mat) {
     StringBuilder sb = new StringBuilder("[");
     for(int x = 0; x < mat.dims(); x++) {
         sb.append(mat.size(x)).append(",");
     }
     sb.deleteCharAt(sb.length()-1);
     sb.append("]");
     return sb.toString();
}

Main portition:

    Net net = Dnn.readNetFromModelOptimizer(irXmlFile, irBinFile);
    net.setPreferableBackend(Dnn.DNN_BACKEND_INFERENCE_ENGINE);
    net.setPreferableTarget(Dnn.DNN_TARGET_CPU);
    Mat image = Imgcodecs.imread(imageFile);
    final Scalar scalar =  new Scalar(0);
    sz = new Size(416, 416); 
    final float scale =  1;
    boolean swapRB = true;
    Mat inputBlob = Dnn.blobFromImage(image, scale, sz, scalar, swapRB, false);
    net.setInput(inputBlob);
    outBlobNames = getOutputNames(net);
    log.trace("outBlobNames:{}", outBlobNames);
    List<Mat> result = new ArrayList<>();
    net.forward(result, outBlobNames);
    log.trace("result size:{}", result.size());
    int recordSize = 80; // 5 + class number
    for(int x = result.size()-1; x >=0 ; x--) {
        Mat level = result.get(x);

        int recNum = level.size(1) / (recordSize+5);
        int targetRows = (int)(level.total()/ (recordSize+5));
        log.trace("{}. layer:{}, level.rows():{}, total:{}, shape:{}, size:{}", x, outBlobNames.get(x), level.height(), level.total(), getShape(level), level.size());
        log.trace("    channels:{}, depth:{}, type:{}, step1:{}, elmSize1:{}, recNum:{}", level.channels(), level.depth(), level.type(), level.step1(), level.elemSize1(), recNum);
        Mat reshape = level.reshape(1,targetRows );
        log.trace("    reshape:{}, size:{}", getShape(reshape), reshape.size());
        for (int j = 0; j < reshape.rows(); ++j) {
            Mat row = reshape.row(j); // size: (1*85)
            float[] data = new float[recordSize+5];
            row.get(0, 0, data);
            float[] data2 = new float[recordSize];
            float boxScore = data[4];
            Mat scores = row.colRange(5, reshape.cols()); 
            scores.get(0,0, data2);
            Core.MinMaxLocResult mm = Core.minMaxLoc(scores);
            Point classIdPoint = mm.maxLoc;
            float  confidence = (float) mm.maxVal* boxScore;
            if (confidence < 0.6) continue;
            float xx = (float) data[0], yy = (float) data[1];
            log.trace("      {}: boxScore:{}, (x,y)={}x{}, classId:{}, confX:{}, row size:{}",
                    j,boxScore,xx, yy, classIdPoint.x, confidence, row.size());
        }
        if (x==1) break; // skip test zero
    }
}

The output is below:

20:08:45.737 INFO    OpenVinoTest: irXmlFile:E:\var\intel\yolo-v3-tiny-tf\FP32\yolo-v3-tiny-tf.xml
20:08:45.739 INFO    OpenVinoTest: irBinFile:E:\var\intel\yolo-v3-tiny-tf\FP32\yolo-v3-tiny-tf.bin
20:08:45.739 INFO    OpenVinoTest: imageFile:E:\yolo\dataset_800x480\images\car\vid01_010663.jpg
20:08:46.073 TRACE   OpenVinoTest: outBlobNames:[conv2d_12/Conv2D/YoloRegion, conv2d_9/Conv2D/YoloRegion]
20:08:46.183 TRACE   OpenVinoTest: result size:2
20:08:46.183 TRACE   OpenVinoTest: 1. layer:conv2d_9/Conv2D/YoloRegion, level.rows():-1, total:43095, shape:[1,255,13,13], size:255x1
20:08:46.187 TRACE   OpenVinoTest:     channels:1, depth:5, type:5, step1:43095, elmSize1:4, recNum:3
20:08:46.187 TRACE   OpenVinoTest:     reshape:[507,85], size:85x507
20:08:46.188 TRACE   OpenVinoTest:       0: boxScore:0.43266684, (x,y)=0.6555209x0.44876736, classId:21.0, confX:0.35278797, row size:85x1
20:08:46.189 TRACE   OpenVinoTest:       1: boxScore:0.55940187, (x,y)=0.48518425x0.619738, classId:3.0, confX:0.5095297, row size:85x1
20:08:46.189 TRACE   OpenVinoTest:       2: boxScore:0.35651144, (x,y)=0.5082219x0.39975503, classId:78.0, confX:0.31428596, row size:85x1
20:08:46.189 TRACE   OpenVinoTest:       3: boxScore:0.54387677, (x,y)=0.692047x0.61677384, classId:54.0, confX:0.4847116, row size:85x1
20:08:46.189 TRACE   OpenVinoTest:       4: boxScore:0.67963123, (x,y)=0.3839952x0.54561156, classId:0.0, confX:0.46246928, row size:85x1
20:08:46.189 TRACE   OpenVinoTest:       5: boxScore:-0.48908097, (x,y)=0.08473439x0.14021659, classId:71.0, confX:-0.60112673, row size:85x1
20:08:46.189 TRACE   OpenVinoTest:       6: boxScore:-1.2083349, (x,y)=-1.1671227x-1.2259784, classId:57.0, confX:-0.79102516, row size:85x1
20:08:46.189 TRACE   OpenVinoTest:       7: boxScore:0.10409927, (x,y)=-0.37625268x0.017036445, classId:49.0, confX:0.058245104, row size:85x1
20:08:46.189 TRACE   OpenVinoTest:       8: boxScore:0.00115722, (x,y)=0.0015306767x0.0012077022, classId:79.0, confX:1.409815E-4, row size:85x1
20:08:46.189 TRACE   OpenVinoTest:       9: boxScore:0.99902374, (x,y)=0.31599534x0.16228594, classId:0.0, confX:0.9940376, row size:85x1
20:08:46.189 TRACE   OpenVinoTest:       10: boxScore:0.03768371, (x,y)=0.08264625x0.0804672, classId:54.0, confX:0.018640134, row size:85x1
20:08:46.189 TRACE   OpenVinoTest:       11: boxScore:4.1239278E-4, (x,y)=0.14673999x0.013528102, classId:20.0, confX:6.939788E-5, row size:85x1
20:08:46.189 TRACE   OpenVinoTest:       12: boxScore:0.020088913, (x,y)=0.022435984x0.024656877, classId:51.0, confX:0.0014823506, row size:85x1
20:08:46.189 TRACE   OpenVinoTest:       13: boxScore:0.0013647558, (x,y)=0.010140199x0.004707809, classId:37.0, confX:2.643366E-4, row size:85x1
20:08:46.190 TRACE   OpenVinoTest:       14: boxScore:0.012278879, (x,y)=0.007483946x0.006147006, classId:72.0, confX:0.0076645594, row size:85x1
20:08:46.190 TRACE   OpenVinoTest:       15: boxScore:0.8039476, (x,y)=0.5736518x0.750716, classId:10.0, confX:0.73497933, row size:85x1
20:08:46.190 TRACE   OpenVinoTest:       16: boxScore:0.001434851, (x,y)=0.0050395555x0.0048673833, classId:76.0, confX:6.316115E-5, row size:85x1
20:08:46.190 TRACE   OpenVinoTest:       17: boxScore:8.649106E-5, (x,y)=0.0012307029x6.8674155E-4, classId:22.0, confX:7.879857E-6, row size:85x1
20:08:46.190 TRACE   OpenVinoTest:       18: boxScore:0.010655698, (x,y)=0.036913924x0.024807066, classId:66.0, confX:9.798995E-4, row size:85x1
20:08:46.190 TRACE   OpenVinoTest:       19: boxScore:3.6505933E-4, (x,y)=1.6786439E-4x3.7726483E-4, classId:53.0, confX:3.9082723E-5, row size:85x1
20:08:46.190 TRACE   OpenVinoTest:       20: boxScore:0.028334405, (x,y)=0.028142175x0.052258987, classId:66.0, confX:0.004819776, row size:85x1
20:08:46.190 TRACE   OpenVinoTest:       21: boxScore:0.037701566, (x,y)=0.017145908x0.022296172, classId:13.0, confX:0.0104359565, row size:85x1
20:08:46.190 TRACE   OpenVinoTest:       22: boxScore:0.0013395402, (x,y)=0.012514944x0.012410511, classId:64.0, confX:2.2230613E-4, row size:85x1
....
....
20:08:46.213 TRACE   OpenVinoTest:       504: boxScore:3.4390436E-4, (x,y)=3.149585E-4x3.4604076E-4, classId:67.0, confX:6.082574E-7, row size:85x1
20:08:46.213 TRACE   OpenVinoTest:       505: boxScore:0.0014405706, (x,y)=0.0012526602x0.0043147514, classId:60.0, confX:1.167454E-5, row size:85x1
20:08:46.213 TRACE   OpenVinoTest:       506: boxScore:2.1452908E-4, (x,y)=2.8849006E-4x4.766963E-4, classId:64.0, confX:9.3569435E-7, row size:85x1

Below is the correct output with Dnn.readNetFromDarknet with the same yolov3-tiny.weight and relevant configuration file:

20:22:50.352 INFO     OpenVinoTest: TEST testYoloInDnnDarknet
20:22:50.352 INFO     OpenVinoTest:     cfgFile:E:\yolo\yolo-coco\yolov3-tiny.cfg
20:22:50.352 INFO     OpenVinoTest: weightsFile:E:\yolo\yolo-coco\yolov3-tiny.weights
20:22:50.352 INFO     OpenVinoTest:   imageFile:E:\yolo\dataset_800x480\images\car\vid01_010663.jpg
20:22:50.579 TRACE    OpenVinoTest: result count: 2
20:22:50.579 INFO     OpenVinoTest:     0. layer:yolo_16, level.rows():507, total:43095, shape:[507,85], size:85x507
20:22:50.579 TRACE    OpenVinoTest:       0:  classId:0.0  row size:85x1, [0.05095583, 0.056363925, 0.16034615, 0.15128478, 1.27485455E-5, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
20:22:50.579 TRACE    OpenVinoTest:       1:  classId:0.0  row size:85x1, [0.044864077, 0.03675152, 0.16261256, 0.42193738, 2.0352368E-6, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
20:22:50.579 TRACE    OpenVinoTest:       2:  classId:0.0  row size:85x1, [0.033654656, 0.03507777, 0.8410822, 0.6181883, 8.072384E-9, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]

I made a comparasion for Dnn.readNetFromModelOptimizer and Dnn.readNetFromDarknet with the same weights file and configuration file.

As you can see, the output of Dnn.readNetFromModelOptimizer and Dnn.readNetFromDarknet have the same shape (85x507), but the contents are different.

My problems are:

Based on the above yolo-v3-tiny-tf page, it said N is the number of detected boxes. So, after manual computation, I know in this case, N is 3. However, From the output of Mat level, I don’t know which row is so-called detected box ?
And I feel the output of Mat level is something wrong, but I don’t know where it is and how to adjust ?
Could anyone please give some suggestions? Many thanks.

Topic		Replies	Views
Algorithm for finding bounding boxes when using Dnn.readNetFromModelOptimizer dnn	2	311	January 31, 2021
Unexpected behavior with ONNX model when read from OpenCV DNN module (Java) Android/Java dnn , ultralytics	8	330	September 10, 2024
How to interpret the result from the yolov3 with blobFromImages as input? dnn , csharp	14	1321	February 21, 2022
Inferencing ONNX model on a RGB image in Android-Java Android/Java dnn , java	7	1014	August 10, 2023
Problems with Dnn TextDetectors (TextDetectionModel_DB & TextDetectionModel_EAST) on Android Android/Java dnn , android , text	5	992	December 7, 2022

Use Dnn.readNetFromModelOptimizer to detect objects

Converted model

Related topics