The case1 network output is 1 x 84 x 6300 where the 1 position is the batch (Batch, Classes, Boxes)
But the case2 network output is 1 x 84 x 6300 for outs[0]
How to define the i of outs[i] (std::vectorcv::Mat)?
Is i a batch? but the outs[i] shape is (Batch, Classes, Boxes)
two batch?
Because the input of cv::dnn::blobFromImages can be the std::vector<cv::Mat> where the j of inpMats[j] can be the batch.
It just like inpMats[0] : (480, 640, 3)
inpMats[1] : (480, 640, 3)
case1 .
std::vector<cv::Mat> inpMats;
// inpMats.push_back(cv::imread("source/data/zidane.jpg"));
inpMats.push_back(cv::imread("source/data/zidane.jpg"));
cv::Mat blob1 = cv::dnn::blobFromImages(inpMats, 1.0 / 255.0, cv::Size(640, 480), cv::Scalar(), true, false);
// blob1 Dimensions: 1 x 3 x 480 x 640
net.setInput(blob1);
cv::Mat single_out = net.forward();
// single_out Dimensions: 1 x 84 x 6300
case2
std::vector<cv::Mat> inpMats;
// inpMats.push_back(cv::imread("source/data/zidane.jpg"));
inpMats.push_back(cv::imread("source/data/zidane.jpg"));
cv::Mat blob1 = cv::dnn::blobFromImages(inpMats, 1.0 / 255.0, cv::Size(640, 480), cv::Scalar(), true, false);
// blob1 Dimensions: 1 x 3 x 480 x 640
net.setInput(blob1);
std::vector<cv::Mat> outs;
net.forward(outs, net.getUnconnectedOutLayersNames());
// outs Dimensions: 1 x 84 x 6300 for outs[0]
I just confuse about the case 2 network output define ( std::vectorcv::Mat outs.)
If the outs is defined (Batch, Classes, Boxes), what is the i of outs[i]
I already know the std::vectorcv::Mat inpMat define is (480, 640, 3) for inpMat[j] and the j is the batch
I reference these code and I want to design the input (many img) and output (batch output),
sorry, your reproducer code is incomplete (no network loaded)
however, IF it is really yolov8, we know, that it has a single output layer only, thus you only need to check outputs[0]
(and it’s the same as a simple net.forward())
(other yolo nn’s have 2 or 3 outputs (pyramid scale levels), which you have to collect, before applying NMS)
I think the yolov8 code is simply, so I just point out the place that I want to focus.
According to your reply, you mean that if I choose to use the net output from case 2, I only need to focus on outs[0], The net output will always only output outs[0], and the batch is already included in outs[0]. ?
And the net output is impossible for 1 x 84 x 6300 for outs[1]
// case2
std::vector<cv::Mat> outs;.
net.forward(outs);
//net.forward(outs, net.getUnconnectedOutLayersNames());
// outs Dimensions: 1 x 84 x 6300 for outs[0]
// or Dimensions: 10 x 84 x 6300 for outs[0]
// it is wrong : 1 x 84 x 6300 for outs[1] impossible
a batch of 7 input images will result in an input blob like
[ 7, 3, 480, 640 ] // NCHW
and (again, for v8, a single) nn output like:
[7, 84, 6300] // batch, box proposal, count
so you still have to parse / process box proposals per image.
maybe the short answer is:
the number/index of output layers is per scale level, not the image/batch count (which is the 1st index of the resp. output layer tensor)
there are a lot of nn’s with multiple outputs, e.g. pose nn’s (with keypoint / connection outputs) or even yolo variants with additional segmentation maps