Yolo objectness parameter (called p0) vs probability parameters as explained by Joseph Redmon

I have been watching [Joseph Redmon (developer of YOLOv3) lecture about YOLO from 45:04 here:

The Ancient Secrets of Computer Vision - 18 - Detection and Instance Segmentation - YouTube

where he explains why he needs both predictions: “objectness” called P0 VS class probabilities called p1,p2…

I understand that he kinda wants to filter out cells with small objectness, before he gets to classify them. But if I understood this right’ then I have such question of 2 parts:

a- when I run already trained YOLO it has 85 outputs where objectness prediction comes out at the same time with the classes predictions confidences - thus I do not see where he saves time or resources. I would understand if we have 2 stages of output: ex: First output of objectness and only than for cells who passed some threshold, another fully connected stage would work and output chosen cells classes probabilities. And it is not the case in the code.

So what do I miss? Maybe P0 plays role only in training process?

b- more over, I do not see in a code of running already trained YOLO any place where P0 is used at all. There are only two filterings take place:

1- one is selecting only predictions where specific class confidence probability is more than some minimum probabaility

2- use NMS function while P0 is not even sent to this function:

results = cv2.dnn.NMSBoxes(bounding_boxes, confidences, probability_minimum, threshold)

bounding boxes - is a list with coordinates of boxes which passed the first filtering by classes(p1,p2…p80)
confidences - is a list of the probabilitis of those classes(for example p1=0.8, p3=0.76 etc)

probability_minimum - is the parameter I have already used to filter out classes with small probabilities

threshold - is the threshold for IoU which is a part of NMS procedure…

Thus I do not see any place we use the objectness parameter (p0), which maybe says that my guess in the part one of the question is right - the P0 is usefull only in training stage? Am I right?

Thank you very much.

1 Like

you’re right !
(i just stumbled over the same thing a few days ago, when looking at yolov5 samples, which actually use this, 3 threshold stages: objectness → class prob → nms.)

it’s not in the opencv code,but imo it should be there ! omission ? small bug ?
@dkurt, if you happen to see this, we’ like your opinion !

we dont need to parse / argmax the class scores (or even do nms), if we can bail out on low objectness already

this seems wrong (to me) as well as this:

[berak]Thank you for the reply
Maybe my English is not so fluent, please tell me, when you say:

we dont need to parse / argmax the class scores (or even do nms), if we can bail out on low objectness already

You actually agree with me or disagree?

Maybe I didnt explain my self well or maybe i missed the key on keyboard but I didn`t ever mean to say that “P1 is used in training only” . I always meant to say the P0 is used in training only (I see now that in my message P1 did appear in this context and it is strange. Anyway already edited it)

Anyways If you listen to what he says in his lecture he says that omitting the irrelevant boxes by objectness parameter allows him to calculate 1 less loss function. Fix me if i am mistaken butI thought that in deep learning the loss function is being calculated only during the training process - to adjust weights isn`t it?

Most important:
Now I will say something that disagrees with what I said above:)
Maybe the P0 IS still being used while runninng already trained yolo(V3 in my case) but the use is done not by USER but is done by the algorithm authomatically: I made some simple test here - opened the “network output predictions” variable in excel and discovered that there are never done predictions for P1,P2P3… when P0 <0.2 (Maybe it was right for 3-4 images I tried to check with) But if I am right here thus the meaning of P0 is as following: It is not wasting time to predict each of 80 classes of COCO if the P0 is too small, and it does this filtering by itself without users intervention.

entirely correct.

not sure what “by USER” would mean,

  • the network calculates scores/rects for ~80000 fixed box proposals in the last layer(s), then it stops. no thresholding of any kind, so far (or any “savings” inside the net)
  • after that, there is postprocessing code (userland ?).
    not all of the 80k proposals are valid, so we need to filter out by class prob (and i’d argue: also by objectness). the more we can throw out, the less has to go into the nms part later

no idea, what you did here,
the dnn samples do check the class props but do not check p0

1 Like

To which inference code are you both referencing?

To my knowledge, most yolo opencv tutorials do this:

  1. use opencv-dnn for the forward pass
  2. use all raw grid-cell&anchor-box outputs ( = dnn-outputs of the yolo layers) and read their x,y,w,h + class values (the objectness is available but unused)
    2.1 test the highest total score value (one available per class) against some threshold and accept an object of that class
  3. perform NMS

The thing is, that to what I observed, the N class values already are the total score (so the objectness*class-probability), which was a surprise for me (when I tried to use the objectness value).
Filtring low objectness values is still possible, but to my experience not usable in practice.

BUT: If I remember it right, the opencv yolo layer already has a hard-coded 0.2 threshold filter on the objecntess and delivers 0 total-score values otherwise.

1 Like

that would be a good explanation, why the p0 value is mostly unused.
but how did you observe this ?

in region_layer.cpp in lines 327 to 331 (OpenCV 4.5.5):

                            int class_index = index_sample_offset + index * cell_size + 5;
                            for (int j = 0; j < classes; ++j) {
                                float prob = scale*srcData[class_index + j];  // prob = IoU(box, object) = t0 * class-probability
                                dstData[class_index + j] = (prob > thresh) ? prob : 0;  // if (IoU < threshold) IoU = 0;

where scale is the objectness

int p_index = index_sample_offset + index * cell_size + 4;
float scale = dstData[p_index];

and there you’ll also find the objectness-thresholding.

I had a look at that code because something was strange when I tried to use the objectness manually. The value-combinations didnt make much sense, but I can’t remember what exactly it was.

1 Like

I remember what there was suspicious. I’ve observed that the objectness value and the highest of all the class values were often identical (and that’s because highest class confidence in my trained 2-class and 5-class networks was often close to 1.0, so p0 * classConf is close to p0)

Then I first played with the values and later had look at the source code.

1 Like