How do I know what each of these index mean for output blob?

res10_300x300_ssd_iter_140000_fp16.caffemodel


net = cv2.dnn.readNetFromCaffe(prototxt_path, model_path)
cap = cv2.VideoCapture(0)


while True:
	ret, frame = cap.read()
	
	blob = cv2.dnn.blobFromImage(cv2.resize(frame, (300, 300)), 1.0, (300, 300), (104.0, 177.0, 123.0))
	net.setInput(blob)
	detections = net.forward()

	# detections.shape is (1, 1, 200, 7)
	# so detection.shape[2] is 200
	# so you are looping through the detected faces
	for i in range(detections.shape[2]):  
		confidence = detections[0, 0, i, 2]  
		if confidence > 0.5:  
			x1 = int(detections[0, 0, i, 3] * frame_w)  
			y1 = int(detections[0, 0, i, 4] * frame_h)  
			x2 = int(detections[0, 0, i, 5] * frame_w)  
			y2 = int(detections[0, 0, i, 6] * frame_h)  
			bboxes.append([x1, y1, x2, y2])  
			bb_line_thickness = max(1, int(round(frame_h / 200)))
			  
			# Draw bounding boxes around detected faces.  
			cv2.rectangle(frame, (x1, y1), (x2, y2), (0, 255, 0), 
				bb_line_thickness, cv2.LINE_8)

	

I used chatGPT to figure it out but is there official documentation?
detections[a, b, c, d] is 4D tensor (matrix)
- a is index to select one of the image in batch AKA bunch of input images
- Usually index is 0 because you input 1 image, neutral network output one image.
- b is index of output channels.
- In the case of DNNs in OpenCV, usually index is 0
- c is the index of detected faces
- depends how many faces are detected
- d is the index between 0 to 6
- 0 - ???
- 1 - ???
- 2 - confidence score
- 3 - normalized top left x of detected face (number between 0 to 1)
- 4 - normalized top left y of detected face (number between 0 to 1)
- 5 - normalized bottom right x of detected face (number between 0 to 1)
- 6 - normalized bottom right y of detected face (number between 0 to 1)
- detections[0, 0, i, 2] means confidence level and is type <class 'numpy.float32'>

Ok I guess you have to get in contact with Aleksandr Rybnikov, the creator of the model…

you’re most likely to find info on that on the internet already. might not be spelled out in words, but found in (original author’s) source code that uses the model.

there might be a scientific paper coinciding with the release of that model.

no, I would not conclude that. according to the further link, that person contributed to opencv’s dnn module, but I see no indication that he is the author of that model.

I mean no one knows officially who made the res10_300x300_ssd_iter_140000_fp16.caffemodel despite the popularity.

This is the only official description about the model, but it still leaves many questions unanswered. That’s why there seems to be so many people asked what dataset the model was trained on.

I agree there isn’t an irrefutable evidence that Aleksandr Rybnikov is the author showing his name on the GitHub. Adrian, author of pyimagesearch, remembers the author as him. I know it’s not the most authoritative source, but do I have a choice?

(NOTE: I edited my reply several times)

Adrian of pyimagesearch is definitely not a reliable source for anything. His blog’s purpose is ad clicks.

best you can hope for is to look at docs for related models, i.e. SSD models and resnet10 models.

the code on a different article of his, reached via this other page, reveals that index 1 probably encodes the class label (should be rounded or floored, I suspect rounded, but his code floors it). that only leaves index 0 as unknown.

that article uses the mobilenet SSD, so a different backbone but highly likely same output format. I dug some more… for some reason, index 0 is supposed to be the index of the image in the batch, so that should just increase from 0 to N-1 of the batch, i.e. it’s quite useless. IDK who would make the model do that, or why.

Sure you could contact Rybnikov. If you do, please report back. Maybe he doesn’t know, maybe he does, and then we can write that down and maybe even extend some documentation.