I know how to use Dnn.readNetFromDarknet()
to detect objects and find their bounding boxes.
Now I want to use Dnn.readNetFromModelOptimizer ()
to do the same thing. So
I download yolo-v3-tiny-tf
from intel open model zoo and convert to OpenVINO IR files.
I read yolo-v3-tiny-tf for understanding the definition of output elements. Based on its description, the converted output is as below:
Converted model
- The array of detection summary info, name -
conv2d_9/BiasAdd/YoloRegion
, shape -1,255,13,13
. The anchor values are81,82, 135,169, 344,319
. - The array of detection summary info, name -
conv2d_12/BiasAdd/YoloRegion
, shape -1,255,26,26
. The anchor values are23,27, 37,58, 81,82
.
For each case format is B,N*85,Cx,Cy
, where
-
B
- batch size -
N
- number of detection boxes for cell -
Cx
,Cy
- cell index
Detection box has format [ x
, y
, h
, w
, box_score
, class_no_1
, …, class_no_80
], where:
- (
x
,y
) - coordinates of box center relative to the cell -
h
,w
- raw height and width of box, apply exponential function and multiply by corresponding anchors to get absolute height and width values -
box_score
- confidence of detection box in [0,1] range -
class_no_1
,…,class_no_80
- probability distribution over the classes in the [0,1] range, multiply by confidence value to get confidence of each class
So my code is below:
Functions:
public static String getShape(Mat mat) {
StringBuilder sb = new StringBuilder("[");
for(int x = 0; x < mat.dims(); x++) {
sb.append(mat.size(x)).append(",");
}
sb.deleteCharAt(sb.length()-1);
sb.append("]");
return sb.toString();
}
Main portition:
Net net = Dnn.readNetFromModelOptimizer(irXmlFile, irBinFile);
net.setPreferableBackend(Dnn.DNN_BACKEND_INFERENCE_ENGINE);
net.setPreferableTarget(Dnn.DNN_TARGET_CPU);
Mat image = Imgcodecs.imread(imageFile);
final Scalar scalar = new Scalar(0);
sz = new Size(416, 416);
final float scale = 1;
boolean swapRB = true;
Mat inputBlob = Dnn.blobFromImage(image, scale, sz, scalar, swapRB, false);
net.setInput(inputBlob);
outBlobNames = getOutputNames(net);
log.trace("outBlobNames:{}", outBlobNames);
List<Mat> result = new ArrayList<>();
net.forward(result, outBlobNames);
log.trace("result size:{}", result.size());
int recordSize = 80; // 5 + class number
for(int x = result.size()-1; x >=0 ; x--) {
Mat level = result.get(x);
int recNum = level.size(1) / (recordSize+5);
int targetRows = (int)(level.total()/ (recordSize+5));
log.trace("{}. layer:{}, level.rows():{}, total:{}, shape:{}, size:{}", x, outBlobNames.get(x), level.height(), level.total(), getShape(level), level.size());
log.trace(" channels:{}, depth:{}, type:{}, step1:{}, elmSize1:{}, recNum:{}", level.channels(), level.depth(), level.type(), level.step1(), level.elemSize1(), recNum);
Mat reshape = level.reshape(1,targetRows );
log.trace(" reshape:{}, size:{}", getShape(reshape), reshape.size());
for (int j = 0; j < reshape.rows(); ++j) {
Mat row = reshape.row(j); // size: (1*85)
float[] data = new float[recordSize+5];
row.get(0, 0, data);
float[] data2 = new float[recordSize];
float boxScore = data[4];
Mat scores = row.colRange(5, reshape.cols());
scores.get(0,0, data2);
Core.MinMaxLocResult mm = Core.minMaxLoc(scores);
Point classIdPoint = mm.maxLoc;
float confidence = (float) mm.maxVal* boxScore;
if (confidence < 0.6) continue;
float xx = (float) data[0], yy = (float) data[1];
log.trace(" {}: boxScore:{}, (x,y)={}x{}, classId:{}, confX:{}, row size:{}",
j,boxScore,xx, yy, classIdPoint.x, confidence, row.size());
}
if (x==1) break; // skip test zero
}
}
The output is below:
20:08:45.737 INFO OpenVinoTest: irXmlFile:E:\var\intel\yolo-v3-tiny-tf\FP32\yolo-v3-tiny-tf.xml
20:08:45.739 INFO OpenVinoTest: irBinFile:E:\var\intel\yolo-v3-tiny-tf\FP32\yolo-v3-tiny-tf.bin
20:08:45.739 INFO OpenVinoTest: imageFile:E:\yolo\dataset_800x480\images\car\vid01_010663.jpg
20:08:46.073 TRACE OpenVinoTest: outBlobNames:[conv2d_12/Conv2D/YoloRegion, conv2d_9/Conv2D/YoloRegion]
20:08:46.183 TRACE OpenVinoTest: result size:2
20:08:46.183 TRACE OpenVinoTest: 1. layer:conv2d_9/Conv2D/YoloRegion, level.rows():-1, total:43095, shape:[1,255,13,13], size:255x1
20:08:46.187 TRACE OpenVinoTest: channels:1, depth:5, type:5, step1:43095, elmSize1:4, recNum:3
20:08:46.187 TRACE OpenVinoTest: reshape:[507,85], size:85x507
20:08:46.188 TRACE OpenVinoTest: 0: boxScore:0.43266684, (x,y)=0.6555209x0.44876736, classId:21.0, confX:0.35278797, row size:85x1
20:08:46.189 TRACE OpenVinoTest: 1: boxScore:0.55940187, (x,y)=0.48518425x0.619738, classId:3.0, confX:0.5095297, row size:85x1
20:08:46.189 TRACE OpenVinoTest: 2: boxScore:0.35651144, (x,y)=0.5082219x0.39975503, classId:78.0, confX:0.31428596, row size:85x1
20:08:46.189 TRACE OpenVinoTest: 3: boxScore:0.54387677, (x,y)=0.692047x0.61677384, classId:54.0, confX:0.4847116, row size:85x1
20:08:46.189 TRACE OpenVinoTest: 4: boxScore:0.67963123, (x,y)=0.3839952x0.54561156, classId:0.0, confX:0.46246928, row size:85x1
20:08:46.189 TRACE OpenVinoTest: 5: boxScore:-0.48908097, (x,y)=0.08473439x0.14021659, classId:71.0, confX:-0.60112673, row size:85x1
20:08:46.189 TRACE OpenVinoTest: 6: boxScore:-1.2083349, (x,y)=-1.1671227x-1.2259784, classId:57.0, confX:-0.79102516, row size:85x1
20:08:46.189 TRACE OpenVinoTest: 7: boxScore:0.10409927, (x,y)=-0.37625268x0.017036445, classId:49.0, confX:0.058245104, row size:85x1
20:08:46.189 TRACE OpenVinoTest: 8: boxScore:0.00115722, (x,y)=0.0015306767x0.0012077022, classId:79.0, confX:1.409815E-4, row size:85x1
20:08:46.189 TRACE OpenVinoTest: 9: boxScore:0.99902374, (x,y)=0.31599534x0.16228594, classId:0.0, confX:0.9940376, row size:85x1
20:08:46.189 TRACE OpenVinoTest: 10: boxScore:0.03768371, (x,y)=0.08264625x0.0804672, classId:54.0, confX:0.018640134, row size:85x1
20:08:46.189 TRACE OpenVinoTest: 11: boxScore:4.1239278E-4, (x,y)=0.14673999x0.013528102, classId:20.0, confX:6.939788E-5, row size:85x1
20:08:46.189 TRACE OpenVinoTest: 12: boxScore:0.020088913, (x,y)=0.022435984x0.024656877, classId:51.0, confX:0.0014823506, row size:85x1
20:08:46.189 TRACE OpenVinoTest: 13: boxScore:0.0013647558, (x,y)=0.010140199x0.004707809, classId:37.0, confX:2.643366E-4, row size:85x1
20:08:46.190 TRACE OpenVinoTest: 14: boxScore:0.012278879, (x,y)=0.007483946x0.006147006, classId:72.0, confX:0.0076645594, row size:85x1
20:08:46.190 TRACE OpenVinoTest: 15: boxScore:0.8039476, (x,y)=0.5736518x0.750716, classId:10.0, confX:0.73497933, row size:85x1
20:08:46.190 TRACE OpenVinoTest: 16: boxScore:0.001434851, (x,y)=0.0050395555x0.0048673833, classId:76.0, confX:6.316115E-5, row size:85x1
20:08:46.190 TRACE OpenVinoTest: 17: boxScore:8.649106E-5, (x,y)=0.0012307029x6.8674155E-4, classId:22.0, confX:7.879857E-6, row size:85x1
20:08:46.190 TRACE OpenVinoTest: 18: boxScore:0.010655698, (x,y)=0.036913924x0.024807066, classId:66.0, confX:9.798995E-4, row size:85x1
20:08:46.190 TRACE OpenVinoTest: 19: boxScore:3.6505933E-4, (x,y)=1.6786439E-4x3.7726483E-4, classId:53.0, confX:3.9082723E-5, row size:85x1
20:08:46.190 TRACE OpenVinoTest: 20: boxScore:0.028334405, (x,y)=0.028142175x0.052258987, classId:66.0, confX:0.004819776, row size:85x1
20:08:46.190 TRACE OpenVinoTest: 21: boxScore:0.037701566, (x,y)=0.017145908x0.022296172, classId:13.0, confX:0.0104359565, row size:85x1
20:08:46.190 TRACE OpenVinoTest: 22: boxScore:0.0013395402, (x,y)=0.012514944x0.012410511, classId:64.0, confX:2.2230613E-4, row size:85x1
....
....
20:08:46.213 TRACE OpenVinoTest: 504: boxScore:3.4390436E-4, (x,y)=3.149585E-4x3.4604076E-4, classId:67.0, confX:6.082574E-7, row size:85x1
20:08:46.213 TRACE OpenVinoTest: 505: boxScore:0.0014405706, (x,y)=0.0012526602x0.0043147514, classId:60.0, confX:1.167454E-5, row size:85x1
20:08:46.213 TRACE OpenVinoTest: 506: boxScore:2.1452908E-4, (x,y)=2.8849006E-4x4.766963E-4, classId:64.0, confX:9.3569435E-7, row size:85x1
Below is the correct output with Dnn.readNetFromDarknet with the same yolov3-tiny.weight and relevant configuration file:
20:22:50.352 INFO OpenVinoTest: TEST testYoloInDnnDarknet
20:22:50.352 INFO OpenVinoTest: cfgFile:E:\yolo\yolo-coco\yolov3-tiny.cfg
20:22:50.352 INFO OpenVinoTest: weightsFile:E:\yolo\yolo-coco\yolov3-tiny.weights
20:22:50.352 INFO OpenVinoTest: imageFile:E:\yolo\dataset_800x480\images\car\vid01_010663.jpg
20:22:50.579 TRACE OpenVinoTest: result count: 2
20:22:50.579 INFO OpenVinoTest: 0. layer:yolo_16, level.rows():507, total:43095, shape:[507,85], size:85x507
20:22:50.579 TRACE OpenVinoTest: 0: classId:0.0 row size:85x1, [0.05095583, 0.056363925, 0.16034615, 0.15128478, 1.27485455E-5, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
20:22:50.579 TRACE OpenVinoTest: 1: classId:0.0 row size:85x1, [0.044864077, 0.03675152, 0.16261256, 0.42193738, 2.0352368E-6, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
20:22:50.579 TRACE OpenVinoTest: 2: classId:0.0 row size:85x1, [0.033654656, 0.03507777, 0.8410822, 0.6181883, 8.072384E-9, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
I made a comparasion for Dnn.readNetFromModelOptimizer and Dnn.readNetFromDarknet with the same weights file and configuration file.
As you can see, the output of Dnn.readNetFromModelOptimizer and Dnn.readNetFromDarknet have the same shape (85x507), but the contents are different.
My problems are:
- Based on the above yolo-v3-tiny-tf page, it said N is the number of detected boxes. So, after manual computation, I know in this case, N is 3. However, From the output of
Mat level
, I don’t know which row is so-called detected box ? - And I feel the output of
Mat level
is something wrong, but I don’t know where it is and how to adjust ?
Could anyone please give some suggestions? Many thanks.