Yolo objectness parameter (called p0) vs probability parameters as explained by Joseph Redmon

I have been watching [Joseph Redmon (developer of YOLOv3) lecture about YOLO from 45:04 here:

The Ancient Secrets of Computer Vision - 18 - Detection and Instance Segmentation - YouTube

where he explains why he needs both predictions: “objectness” called P0 VS class probabilities called p1,p2…

I understand that he kinda wants to filter out cells with small objectness, before he gets to classify them. But if I understood this right’ then I have such question of 2 parts:

a- when I run already trained YOLO it has 85 outputs where objectness prediction comes out at the same time with the classes predictions confidences - thus I do not see where he saves time or resources. I would understand if we have 2 stages of output: ex: First output of objectness and only than for cells who passed some threshold, another fully connected stage would work and output chosen cells classes probabilities. And it is not the case in the code.

So what do I miss? Maybe P0 plays role only in training process?

b- more over, I do not see in a code of running already trained YOLO any place where P0 is used at all. There are only two filterings take place:

1- one is selecting only predictions where specific class confidence probability is more than some minimum probabaility

2- use NMS function while P0 is not even sent to this function:

results = cv2.dnn.NMSBoxes(bounding_boxes, confidences, probability_minimum, threshold)

bounding boxes - is a list with coordinates of boxes which passed the first filtering by classes(p1,p2…p80)
confidences - is a list of the probabilitis of those classes(for example p1=0.8, p3=0.76 etc)

probability_minimum - is the parameter I have already used to filter out classes with small probabilities

threshold - is the threshold for IoU which is a part of NMS procedure…

Thus I do not see any place we use the objectness parameter (p0), which maybe says that my guess in the part one of the question is right - the P0 is usefull only in training stage? Am I right?

Thank you very much.

1 Like

you’re right !
(i just stumbled over the same thing a few days ago, when looking at yolov5 samples, which actually use this, 3 threshold stages: objectness → class prob → nms.)

it’s not in the opencv code,but imo it should be there ! omission ? small bug ?
@dkurt, if you happen to see this, we’ like your opinion !

we dont need to parse / argmax the class scores (or even do nms), if we can bail out on low objectness already

this seems wrong (to me) as well as this:

[berak]Thank you for the reply
1)
Maybe my English is not so fluent, please tell me, when you say:

we dont need to parse / argmax the class scores (or even do nms), if we can bail out on low objectness already

You actually agree with me or disagree?

Maybe I didnt explain my self well or maybe i missed the key on keyboard but I didn`t ever mean to say that “P1 is used in training only” . I always meant to say the P0 is used in training only (I see now that in my message P1 did appear in this context and it is strange. Anyway already edited it)

Anyways If you listen to what he says in his lecture he says that omitting the irrelevant boxes by objectness parameter allows him to calculate 1 less loss function. Fix me if i am mistaken butI thought that in deep learning the loss function is being calculated only during the training process - to adjust weights isn`t it?

Most important:
Now I will say something that disagrees with what I said above:)
Maybe the P0 IS still being used while runninng already trained yolo(V3 in my case) but the use is done not by USER but is done by the algorithm authomatically: I made some simple test here - opened the “network output predictions” variable in excel and discovered that there are never done predictions for P1,P2P3… when P0 <0.2 (Maybe it was right for 3-4 images I tried to check with) But if I am right here thus the meaning of P0 is as following: It is not wasting time to predict each of 80 classes of COCO if the P0 is too small, and it does this filtering by itself without users intervention.

entirely correct.

not sure what “by USER” would mean,

  • the network calculates scores/rects for ~80000 fixed box proposals in the last layer(s), then it stops. no thresholding of any kind, so far (or any “savings” inside the net)
  • after that, there is postprocessing code (userland ?).
    not all of the 80k proposals are valid, so we need to filter out by class prob (and i’d argue: also by objectness). the more we can throw out, the less has to go into the nms part later

no idea, what you did here,
the dnn samples do check the class props but do not check p0

1 Like