The output type of net.forward after blobFromImages

Assume zidane.jpg size is (480, 640, 3)

The case1 network output is 1 x 84 x 6300 where the 1 position is the batch (Batch, Classes, Boxes)

But the case2 network output is 1 x 84 x 6300 for outs[0]
How to define the i of outs[i] (std::vectorcv::Mat)?
Is i a batch? but the outs[i] shape is (Batch, Classes, Boxes)
two batch?

Because the input of cv::dnn::blobFromImages can be the std::vector<cv::Mat> where the j of inpMats[j] can be the batch.
It just like inpMats[0] : (480, 640, 3)
inpMats[1] : (480, 640, 3)

case1 .

    std::vector<cv::Mat> inpMats;
  //  inpMats.push_back(cv::imread("source/data/zidane.jpg"));
    inpMats.push_back(cv::imread("source/data/zidane.jpg"));
    cv::Mat blob1 = cv::dnn::blobFromImages(inpMats, 1.0 / 255.0, cv::Size(640, 480), cv::Scalar(), true, false);
// blob1 Dimensions: 1 x 3 x 480 x 640

    net.setInput(blob1);
    cv::Mat single_out = net.forward();
// single_out    Dimensions: 1 x 84 x 6300 

case2

    std::vector<cv::Mat> inpMats;
  //  inpMats.push_back(cv::imread("source/data/zidane.jpg"));
    inpMats.push_back(cv::imread("source/data/zidane.jpg"));
    cv::Mat blob1 = cv::dnn::blobFromImages(inpMats, 1.0 / 255.0, cv::Size(640, 480), cv::Scalar(), true, false);
// blob1 Dimensions: 1 x 3 x 480 x 640

    net.setInput(blob1);
    std::vector<cv::Mat> outs;
    net.forward(outs, net.getUnconnectedOutLayersNames());
// outs  Dimensions: 1 x 84 x 6300  for outs[0] 

what kind of network is it ?
what are you trying to achieve ?
(the purpose of it ?)

what does net.getUnconnectedOutLayersNames() return ?

please, try to make MRE from your code snippets, so ppl here can actually try to reproduce it, ty.

I just confuse about the case 2 network output define ( std::vectorcv::Mat outs.)

If the outs is defined (Batch, Classes, Boxes), what is the i of outs[i]
I already know the std::vectorcv::Mat inpMat define is (480, 640, 3) for inpMat[j] and the j is the batch

I reference these code and I want to design the input (many img) and output (batch output),

opencv/modules/dnn/test/test_onnx_importer.cpp at 450e741f8d53ff12b4e194c7762adaefb952555a · opencv/opencv · GitHub

Reproduce

Just follow the code that I paste and I use blobFromImages

    std::vector<cv::Mat> inpMats;
    inpMats.push_back(cv::imread("source/data/zidane.jpg"));
    cv::Mat blob1 = cv::dnn::blobFromImages(inpMats, 1.0 / 255.0, cv::Size(640, 480), cv::Scalar(), true, false);
    net.setInput(blob1);

    //case1
    cv::Mat single_out = net.forward();
   // single_out    Dimensions: 1 x 84 x 6300 



     // case2
    std::vector<cv::Mat> outs;.
    net.forward(outs);
    //net.forward(outs, net.getUnconnectedOutLayersNames());
   // outs  Dimensions: 1 x 84 x 6300  for outs[0] 

sorry, your reproducer code is incomplete (no network loaded)

however, IF it is really yolov8, we know, that it has a single output layer only, thus you only need to check outputs[0]
(and it’s the same as a simple net.forward())

(other yolo nn’s have 2 or 3 outputs (pyramid scale levels), which you have to collect, before applying NMS)

Thanks for your reply

I think the yolov8 code is simply, so I just point out the place that I want to focus.

According to your reply, you mean that if I choose to use the net output from case 2, I only need to focus on outs[0], The net output will always only output outs[0], and the batch is already included in outs[0]. ?

And the net output is impossible for 1 x 84 x 6300 for outs[1]

     // case2
    std::vector<cv::Mat> outs;.
    net.forward(outs);
    //net.forward(outs, net.getUnconnectedOutLayersNames());
   // outs  Dimensions: 1 x 84 x 6300  for outs[0] 
   // or Dimensions: 10 x 84 x 6300  for outs[0] 
   // it is wrong :   1 x 84 x 6300  for outs[1]   impossible

ok, so part 2, - batches :

a batch of 7 input images will result in an input blob like

[ 7, 3, 480, 640 ] // NCHW

and (again, for v8, a single) nn output like:

[7, 84, 6300] // batch, box proposal, count

so you still have to parse / process box proposals per image.

maybe the short answer is:
the number/index of output layers is per scale level, not the image/batch count (which is the 1st index of the resp. output layer tensor)

1 Like

Yes , you are right.

conclusion

yolov8 output is one output layer tensor, so there is only outs[0] for case 2 design

If some network output is three output layer tensor, there are three (outs[0] outs[1] outs[2]) for case 2 design.

new question

But if some network output is three output layer tensor, it can not use the case 1 design, right?

case 1    network output type  is  cv::Mat

case 2    network output type  is   std::vector<cv::Mat> 

indeed !

there are a lot of nn’s with multiple outputs, e.g. pose nn’s (with keypoint / connection outputs) or even yolo variants with additional segmentation maps