Confidence level of detection is different in ultralytics tool compared to opencv code

Hi,

For the same image and model (.onnx) I made the inference using ultralytics tool and also with the opencv code found in Object Detection using YOLOv5 OpenCV DNN in C++ and Python

With the ultralytics tool, I was able to get 86% confidence, while with the opencv code I got 25%.

Without showing the modified code referenced above as it is from the company I work for, are you able to give me hints to why this is happening? I’m making inference with the yolov5m6 model and 1280x1280 image input size.

Thank you

Hi,

Does anyone have any idea? Did anyone faced the same problem? Please, this is urgent.

Thanks

I have a new update: It turns out the tool gives 25% confidence if the image is resized to 1280x 1280. Basically this is happening in cv2.dnn.blobFromImage(frame, 1 / 255.0, (1280, 1280), swapRB=True, crop=False) where the image is resized.
Is there any way to change the blobfromimage flags to get the desired confidence?

Thanks

to help you, we would need information. code, preprocessing, data. (for both c++ & onnx)

as long as you don’t give us anything — expect no help.

Ok, here is the code:

I get an image from a camera by requests, decode with opencv and then I do the same process as mentioned above in the tutorial of learnopencv. Just to clarify that “for net in range(len(self.net)):” exists because I’m processing the same image for different models, each model one class.

def processa(self,url,username,password,index,sectorXY):
            imagem=requests.get(url, auth=HTTPDigestAuth(username, password))
            frame = cv2.imdecode(np.fromstring(imagem.content, np.uint8), cv2.IMREAD_UNCHANGED)


            tamanho = frame.shape
            altura = frame.shape[0]
            comprimento = frame.shape[1]
            blob = cv2.dnn.blobFromImage(frame, 1 / 255.0, (1280, 1280), swapRB=True, crop=False)
            objetos_capturados_frame = []
            smoke_detections=False
            for net in range(len(self.net)):
              print(net)
              if str(net) not in self.filtros[index]:
                 continue
              layer_names = self.net[net].getLayerNames()
              outputlayers = [layer_names[i-1] for i in self.net[net].getUnconnectedOutLayers()] 
              self.net[net].setInput(blob)
              outputs = self.net[net].forward(outputlayers)
              class_ids = []
              confidences = []  # Grau de confiança sobre a imagem
              caixas = []
              rows = outputs[0].shape[1]
              x_factor = 3840 / 1280
              y_factor =  2160 / 1280
              
              for r in range(rows):
              
                row = outputs[0][0][r]

                confidence = row[4]

            # Discard bad detections and continue.
                if confidence >= (self.confidence/100):

                    classes_scores = row[5:]


                    if classes_scores[0] > 0.5: 

                        cx, cy, w, h = row[0], row[1], row[2], row[3]

                        left = int((cx - w/2) * x_factor)

                        top = int((cy - h/2) * y_factor)

                        width = int(w * x_factor)

                        height = int(h * y_factor)


                        caixas.append([left, top, width, height])
                        confidences.append(float(confidence))
                        class_ids.append(net)

              indexes = cv2.dnn.NMSBoxes(caixas, confidences, 0.5, 0.4)
              if len(indexes) > 0 and net == 0:
                 smoke_detections=True
              for i in indexes:
                objeto_no_frame = {}
                objetos_capturados_frame_aux=[]
                #i = i[0]
                caixa = caixas[i]
                x = caixa[0]
                y = caixa[1]
                w = caixa[2]
                h = caixa[3]
                objeto_no_frame["object_id"] = int(class_ids[i])
                objeto_no_frame["confianca"] = round(confidences[i],2)
                objeto_no_frame["topLeft"] = [x, y]
                objeto_no_frame["bottomRight"] = [w, h]
                
              objetos_capturados_frame.append(objeto_no_frame)

looks like you’re only processing the 1st (most coarse pyramid level )
of 3 (?) yolo output layers

that sounds redundant.
wouldn’t you rather train a single model on multiple desired classes ?

this code won’t run from 4.6.0 on (you must be using something outdated !),
please instead use:

outputlayers = self.net[net].getUnconnectedOutLayersNames()

I will try to check the yolo output layers thing. But as I mentioned in the 3rd reply, I also got 25% of confidence in the ultralytics tool. I did the following:

  • In one test, I did this - python detect.py --weights runs/train/exp/weights/best.onnx --source image.jpg --imgsz 1280 1280 --device 0 - where image.jpg is a 4K image and got 84% of confidence

  • In another test, I did - python detect.py --weights runs/train/exp/weights/best.onnx --source image_resized.jpg --imgsz 1280 1280 --device 0 - where image_resized.jpg is a 1280x1280 image and got 25% of confidence

In my opinion, I think the cv2.dnn.blobfromimage is resizing the image and the inference is made in the resized image not the original. But the problem is that I exported the onnx model in the 1280x1280 size, so the parameter size in blobfromimage can only be (1280,1280)

please update to current (4.8.0)
else all measuring is outdated / irrelevant,

I have the 4.6.0 version. But I don’t think this is a version issue, has yolov5 is from 2020/2021 and my opencv version is from 2022. I mean if there was a problem with a previous version of opencv, yolov5 would never work up until 2023 that is when opencv 4.8.0 was released. There is something else that is making my issue. And I have a feeling it is related to image resizing, but I don’t know how to overpass this.

maybe you can export several sizes & check

it should be done per class, like here:

I found out the issue. You see, if I resize the image with blobfromimage, the data is distorted, but if I use the strategy mentioned in the following link Detecting objects with YOLOv5, OpenCV, Python and C++ | by Luiz doleron | MLearning.ai | Medium I discovered that the confidence passes from 25% to 79% of confidence. The strategy is:

  • Create a square numpy array with the maximum side of the image (height or width)

  • Print the image exactly as it was captured (16:9) and print it in the numpy array

  • This way the numpy array is a square array where the image stays at the top of it

  • Since the images I’m working are 16:9 and 4K the height is only 2160, so the rest of the height needs to be black (np.zeros).

  • Now that I have the square image (originial image + black pixels), I can resize it to 1280x1280

The code that helped is the following:

col, row, _ = source.shape
_max = max(col, row)
resized = np.zeros((_max, _max, 3), np.uint8)
resized[0:col, 0:row] = source

I still don’t know why there is still a discrepancy of 5% confidence, but I’m on the right track.
Anyway, thank you for your help. Believe it or not the word you wrote “outdated” helped me a lot as I tried to find a more recent tutorial of the code.

Regards

1 Like