How to interpret the result from the yolov3 with blobFromImages as input?

How to use the result for each image when processing the batch of images using the blobfromimages as the input to the yolo model ?

How the post processing for each image should be approached to get the bounding box values ?


kindly look at the samples for this

in general:

  • collect “box proposals” from several output layers (representing different scales)
    each row in there is:
    cx,cy,h,w,box_probability,p1,p2,…pn (where p1…pn are class probabilities)
  • filter out bad proposals by thresholding box / class probs
  • filter out near duplicates by applying NMS
1 Like

I have tried the following :

Implemented in C#:
I have choosen batch of 8 images.

Mat blob = CvDnn.BlobFromImages(inputImages, 1 / 255.0, swapRB: true);

#input shape = (8,3,672,672) as (Images,Channels,width,height)

step 1:
var outNames = model.GetUnconnectedOutLayersNames();
var outs = outNames.Select(_ => new Mat()).ToArray();
model.Forward(outs, outNames); (“The outs will have the predictions from 3 yolo detection layers “yolo_82”,“yolo_94” and “yolo_106” and these should be used to get the result”)

outs has 3 matrices of these {Mat [ -1*-1*CV_32FC1]

Step 2:
var probs = model.Forward();
probs is a single matrix of {Mat [ -1*-1*CV_32FC1]

I am not sure on how to interpret these matrices and get predictions for each image.

Any comment would be greatly appreciated.
Thanks in advance.

quite unfortunate, as there is no official c# wrapper, and now you have an API problem, specific to that, and noone knows here ;(
at least kindly tell us, what exactly you’re using here ?

how did you retrieve that ? do the same for the outputs

that’s bogus. it’s probably printing out Mat::size(), which is 2d only, and cannot handle 4d tensors (so rows & cols are set to -1 on purpose)
the c++ Mat has a size member, which would hold the proper dims in this case

correct. so 3 output Mats per forward pass

why this ? seems wrong / unnessecary. Step1 should be all you need.
(this way, ,you only get the very last layer, “yolo_106”)

if you want my 2ct:
try to get it working for a single image first
(seems you’re not even there yet), then try with batches

I have used OpenCVsharp for .Net and C# programming language. OpenCv provides function, to load the darknet weights directly using the config and the weights file.

  1. I have successfully implemented the detection’s for single image and the above code is a tried version for batches.

  2. Yes, I would also use the step1 from above to set the inputs and do the forwards pass.
    The output matrices has size and it is giving me the width and height.
    1st matrix , size = (width:1323 height:8) and (rows,columns) = -1,-1
    2nd matrix , size = (width:5292 height:8)and (rows,columns) = -1,-1
    3rd matrix, size = (width:21168 height:8) and (rows,columns) = -1,-1

The value height changes on the number of images used and this is what I have seen from debugging.

I have challenges unpacking the output matrices and get the results for each image.

hmm. numbers look familiar, however, i’d expect it more like [8,1323,85]
as in: 8 images, 1323 rows, 85 cols

oh Okay. I don’t exactly know , how the result is provided as output from the model for batches.

It would be very helpful if you guide me further to achieve this. I have looked a lot and no proper implementation has been found.

I will continue to look into it. Thanks

sorry, but i cant help ay further.
again, it’s a problem with your c# API (which we have no knowledge about here)

raise an issue on that github repo

Thanks for your reply.

I believe the challenge is to interpret the output from the opencv dnn module for the yolo model and nothing to do with C# or anything else.I have a opencv and python implementation too.

Thanks anyway.

so, again, i made a small test from c++, using yolov3 and a batch size of 8.

printing out the output size members gives:

yolo_82   8 x 1200 x 85
yolo_94   8 x 4800 x 85
yolo_106  8 x 19200 x 85

(i think, this must be interpreted as [batch_size x height x width])

Thanks for more help.

This is what I wanted to know. I do not have these output sizes from the model as we have seen above. I will try to get these results and proceed further. Thanks again.

one problem seems to be, finding the correct height (num rows).
(we already know the batch size(8), and the col length(85))
can you check, if e.g. outs[0].Total() / (8*85) == 1200 ?

from there on, you could build an ordinary 2d Mat from it, e.g. for the 3rd batch image of outs[0]:

IntPtr p = outs[0].Ptr(2);
Mat m = new Mat(h,w,CV_32F, p);

I have the following information from the outs.
I have Blob Shape = (Images,Channels,height,width) = (8,3,608,608)

outs[0] has the following :
dims = 3, channels = 1
size = (width:1323 height:8) ( I get this by using , outs[0].size() )
size3d = 8 X 1323 X 7 (I have got this by using the size of each dimension, i.e size(dim))
(rows,columns) = -1,-1
outs.Total() = 74088

outs[1] has the following :
dims = 3, channels = 1
size = (width:5292 height:8) ( I get this by using , outs[1].size() )
size3d = 8 X 5292 X 7
(rows,columns) = -1,-1
outs.Total() = 296352

outs[2] has the following :
dims = 3, channels = 1
size = (width:21168 height:8) I get this by using , outs[2].size() )
size3d = 8 X 21168X 7
(rows,columns) = -1,-1
outs.Total() = 1185408

I don’t see 85 anywhere, am I missing anything about this. ?

it is actually 5 + nclasses per col for the default pretrained networks
(80 COCO classes, iirc)

so you have a custom trained, 2 class network there ?


Yes, I have trained the model for 2 custom classes.

I have finished the implementation for batches and thanks for all the help from you.
Thanks a ton.