How do I know what each of these index mean for output blob?

samuelwhiskeyjohnson · February 23, 2025, 3:04am

res10_300x300_ssd_iter_140000_fp16.caffemodel


net = cv2.dnn.readNetFromCaffe(prototxt_path, model_path)
cap = cv2.VideoCapture(0)


while True:
	ret, frame = cap.read()
	
	blob = cv2.dnn.blobFromImage(cv2.resize(frame, (300, 300)), 1.0, (300, 300), (104.0, 177.0, 123.0))
	net.setInput(blob)
	detections = net.forward()

	# detections.shape is (1, 1, 200, 7)
	# so detection.shape[2] is 200
	# so you are looping through the detected faces
	for i in range(detections.shape[2]):  
		confidence = detections[0, 0, i, 2]  
		if confidence > 0.5:  
			x1 = int(detections[0, 0, i, 3] * frame_w)  
			y1 = int(detections[0, 0, i, 4] * frame_h)  
			x2 = int(detections[0, 0, i, 5] * frame_w)  
			y2 = int(detections[0, 0, i, 6] * frame_h)  
			bboxes.append([x1, y1, x2, y2])  
			bb_line_thickness = max(1, int(round(frame_h / 200)))
			  
			# Draw bounding boxes around detected faces.  
			cv2.rectangle(frame, (x1, y1), (x2, y2), (0, 255, 0), 
				bb_line_thickness, cv2.LINE_8)

I used chatGPT to figure it out but is there official documentation?
detections[a, b, c, d] is 4D tensor (matrix)
- a is index to select one of the image in batch AKA bunch of input images
- Usually index is 0 because you input 1 image, neutral network output one image.
- b is index of output channels.
- In the case of DNNs in OpenCV, usually index is 0
- c is the index of detected faces
- depends how many faces are detected
- d is the index between 0 to 6
- 0 - ???
- 1 - ???
- 2 - confidence score
- 3 - normalized top left x of detected face (number between 0 to 1)
- 4 - normalized top left y of detected face (number between 0 to 1)
- 5 - normalized bottom right x of detected face (number between 0 to 1)
- 6 - normalized bottom right y of detected face (number between 0 to 1)
- detections[0, 0, i, 2] means confidence level and is type <class 'numpy.float32'>

samuelwhiskeyjohnson · February 23, 2025, 3:17am

Ok I guess you have to get in contact with Aleksandr Rybnikov, the creator of the model…

crackwitz · February 23, 2025, 9:34am

you’re most likely to find info on that on the internet already. might not be spelled out in words, but found in (original author’s) source code that uses the model.

there might be a scientific paper coinciding with the release of that model.

no, I would not conclude that. according to the further link, that person contributed to opencv’s dnn module, but I see no indication that he is the author of that model.

samuelwhiskeyjohnson · February 23, 2025, 12:14pm

I mean no one knows officially who made the res10_300x300_ssd_iter_140000_fp16.caffemodel despite the popularity.

This is the only official description about the model, but it still leaves many questions unanswered. That’s why there seems to be so many people asked what dataset the model was trained on.

opencv/samples/dnn/face_detector/how_to_train_face_detector.txt at 3.4.3 · opencv/opencv · GitHub

I agree there isn’t an irrefutable evidence that Aleksandr Rybnikov is the author showing his name on the GitHub. Adrian, author of pyimagesearch, remembers the author as him. I know it’s not the most authoritative source, but do I have a choice?

Face detection with OpenCV and deep learning - PyImageSearch

crackwitz · February 23, 2025, 12:44pm

(NOTE: I edited my reply several times)

Adrian of pyimagesearch is definitely not a reliable source for anything. His blog’s purpose is ad clicks.

best you can hope for is to look at docs for related models, i.e. SSD models and resnet10 models.

the code on a different article of his, reached via this other page, reveals that index 1 probably encodes the class label (should be rounded or floored, I suspect rounded, but his code floors it). that only leaves index 0 as unknown.

that article uses the mobilenet SSD, so a different backbone but highly likely same output format. I dug some more… for some reason, index 0 is supposed to be the index of the image in the batch, so that should just increase from 0 to N-1 of the batch, i.e. it’s quite useless. IDK who would make the model do that, or why.

Sure you could contact Rybnikov. If you do, please report back. Maybe he doesn’t know, maybe he does, and then we can write that down and maybe even extend some documentation.

Topic		Replies	Views
How to find description/details/notes on dnn.forward()? Python dnn	3	239	June 20, 2021
How to use output of dnn::net::forward() C++ dnn	4	75	June 13, 2025
How to interpret the result from the yolov3 with blobFromImages as input? dnn , csharp	14	1321	February 21, 2022
The output type of net.forward after blobFromImages C++ dnn	7	87	September 29, 2024
How to perform batch inference object detection using DNN module? Python dnn	3	2355	February 18, 2022

How do I know what each of these index mean for output blob?

Related topics