How to use the result for each image when processing the batch of images using the blobfromimages as the input to the yolo model ?
How the post processing for each image should be approached to get the bounding box values ?
Thanks
How to use the result for each image when processing the batch of images using the blobfromimages as the input to the yolo model ?
How the post processing for each image should be approached to get the bounding box values ?
Thanks
kindly look at the samples for this
in general:
I have tried the following :
Implemented in C#:
I have choosen batch of 8 images.
Mat blob = CvDnn.BlobFromImages(inputImages, 1 / 255.0, swapRB: true);
#input shape = (8,3,672,672) as (Images,Channels,width,height)
model.SetInput(input);
model.SetPreferableBackend(Backend.OPENCV);
model.SetPreferableTarget(Target.CPU);
step 1:
var outNames = model.GetUnconnectedOutLayersNames();
var outs = outNames.Select(_ => new Mat()).ToArray();
model.Forward(outs, outNames); (“The outs will have the predictions from 3 yolo detection layers “yolo_82”,“yolo_94” and “yolo_106” and these should be used to get the result”)
outs has 3 matrices of these {Mat [ -1*-1*CV_32FC1]
Step 2:
var probs = model.Forward();
probs is a single matrix of {Mat [ -1*-1*CV_32FC1]
I am not sure on how to interpret these matrices and get predictions for each image.
Any comment would be greatly appreciated.
Thanks in advance.
quite unfortunate, as there is no official c# wrapper, and now you have an API problem, specific to that, and noone knows here ;(
at least kindly tell us, what exactly you’re using here ?
how did you retrieve that ? do the same for the outputs
that’s bogus. it’s probably printing out Mat::size(), which is 2d only, and cannot handle 4d tensors (so rows & cols are set to -1 on purpose)
the c++ Mat has a size member, which would hold the proper dims in this case
correct. so 3 output Mats per forward pass
why this ? seems wrong / unnessecary. Step1 should be all you need.
(this way, ,you only get the very last layer, “yolo_106”)
if you want my 2ct:
try to get it working for a single image first
(seems you’re not even there yet), then try with batches
I have used OpenCVsharp for .Net and C# programming language. OpenCv provides function, to load the darknet weights directly using the config and the weights file.
I have successfully implemented the detection’s for single image and the above code is a tried version for batches.
Yes, I would also use the step1 from above to set the inputs and do the forwards pass.
The output matrices has size and it is giving me the width and height.
1st matrix , size = (width:1323 height:8) and (rows,columns) = -1,-1
2nd matrix , size = (width:5292 height:8)and (rows,columns) = -1,-1
3rd matrix, size = (width:21168 height:8) and (rows,columns) = -1,-1
The value height changes on the number of images used and this is what I have seen from debugging.
I have challenges unpacking the output matrices and get the results for each image.
hmm. numbers look familiar, however, i’d expect it more like [8,1323,85]
as in: 8 images, 1323 rows, 85 cols
oh Okay. I don’t exactly know , how the result is provided as output from the model for batches.
It would be very helpful if you guide me further to achieve this. I have looked a lot and no proper implementation has been found.
I will continue to look into it. Thanks
sorry, but i cant help ay further.
again, it’s a problem with your c# API (which we have no knowledge about here)
raise an issue on that github repo
Thanks for your reply.
I believe the challenge is to interpret the output from the opencv dnn module for the yolo model and nothing to do with C# or anything else.I have a opencv and python implementation too.
Thanks anyway.
so, again, i made a small test from c++, using yolov3 and a batch size of 8.
printing out the output size
members gives:
yolo_82 8 x 1200 x 85
yolo_94 8 x 4800 x 85
yolo_106 8 x 19200 x 85
(i think, this must be interpreted as [batch_size x height x width])
Hi,
Thanks for more help.
This is what I wanted to know. I do not have these output sizes from the model as we have seen above. I will try to get these results and proceed further. Thanks again.
one problem seems to be, finding the correct height (num rows).
(we already know the batch size(8), and the col length(85))
can you check, if e.g. outs[0].Total() / (8*85) == 1200
?
from there on, you could build an ordinary 2d Mat from it, e.g. for the 3rd batch image of outs[0]:
IntPtr p = outs[0].Ptr(2);
Mat m = new Mat(h,w,CV_32F, p);
I have the following information from the outs.
I have Blob Shape = (Images,Channels,height,width) = (8,3,608,608)
outs[0] has the following :
dims = 3, channels = 1
size = (width:1323 height:8) ( I get this by using , outs[0].size() )
size3d = 8 X 1323 X 7 (I have got this by using the size of each dimension, i.e size(dim))
(rows,columns) = -1,-1
outs.Total() = 74088
outs[1] has the following :
dims = 3, channels = 1
size = (width:5292 height:8) ( I get this by using , outs[1].size() )
size3d = 8 X 5292 X 7
(rows,columns) = -1,-1
outs.Total() = 296352
outs[2] has the following :
dims = 3, channels = 1
size = (width:21168 height:8) I get this by using , outs[2].size() )
size3d = 8 X 21168X 7
(rows,columns) = -1,-1
outs.Total() = 1185408
I don’t see 85 anywhere, am I missing anything about this. ?
Thanks
it is actually 5 + nclasses per col for the default pretrained networks
(80 COCO classes, iirc)
so you have a custom trained, 2 class network there ?
Hi,
Yes, I have trained the model for 2 custom classes.
I have finished the implementation for batches and thanks for all the help from you.
Thanks a ton.