Output from cv::dnn::Net.forward() is multidimensional with yolov5

I have successfully exported a yolov5 model to ONNX and was able to read the model using readNetFromONNX(). I then set input using a test image and ran net.forward() which returned a Mat. I am now working on postprocessing to interpret the data contained in the returned Mat.

Most of the examples that I have found that illustrate calling forward() and interpreting the results assume that the returned Mat is 2D. In contrast, the Mat that I am getting is 3D. More specifically, the rank of the Mat is 3. The sizes of these three dimensions are 1x25200x8.

I have not been able to find any information about how to interpret such a result and was wondering if anyone has any suggestions?

Thank you.

1 Like

you can simply reshape the output to 2d like:

Mat res = output.reshape(1,25200); // [8x25200]

yolov5 has 25200 possible boxes, each row in the 2d Mat is:

cx, cy, w, h, box_prob, p1, p2, p3,  ..., pn

where p1 … pn are N class probabilities (3 in your case ?)

someone supplied example code here:

1 Like

Beautiful! Thank you so much!

1 Like

As I failed to understand berak’s answer, I found a solution here and thought it might provide useful to other users:


  blob = cv2.dnn.blobFromImage(input_image, 1/255, (640, 640), [0,0,0], 
  net.setInput(blob)
  output_layers = net.getUnconnectedOutLayersNames()
  outputs = net.forward(output_layers)

	# Lists to hold respective values while unwrapping.
	class_ids = ['frist class',' second class']
	confidences = []
	boxes = []

	# Rows.
	rows = outputs[0].shape[1]

	image_height, image_width = input_image.shape[:2]

	# Resizing factor.
	x_factor = image_width / 640 # get var value from your loaded image
	y_factor =  image_height / 640

	# Iterate through 25200 detections.
	for r in range(rows):
		row = outputs[0][0][r]
		confidence = row[4]

		# Discard bad detections and continue.
		if confidence >= CONFIDENCE_THRESHOLD:
			classes_scores = row[5:]

			# Get the index of max class score.
			class_id = np.argmax(classes_scores)

			#  Continue if the class score is above threshold.
			if (classes_scores[class_id] > SCORE_THRESHOLD):
				confidences.append(confidence)
				class_ids.append(class_id)

				cx, cy, w, h = row[0], row[1], row[2], row[3]

				left = int((cx - w/2) * x_factor)
				top = int((cy - h/2) * y_factor)
				width = int(w * x_factor)
				height = int(h * y_factor)
			  
				box = np.array([left, top, width, height])
				boxes.append(box)

	# Perform non maximum suppression to eliminate redundant overlapping boxes with
	# lower confidences.
	indices = cv2.dnn.NMSBoxes(boxes, confidences, CONFIDENCE_THRESHOLD, NMS_THRESHOLD)
	for i in indices:

		box = boxes[i]
		left = box[0]
		top = box[1]
		width = box[2]
		height = box[3]
		cv2.rectangle(input_image, (left, top), (left + width, top + height), BLUE, 3*THICKNESS)
		label = "{}:{:.2f}".format(classes[class_ids[i]], confidences[i])
		draw_label(input_image, label, left, top)
		print(label) # will contain label and confidence