Confidence level of detection is different in ultralytics tool compared to opencv code

p.carvalho · August 11, 2023, 1:48pm

Hi,

For the same image and model (.onnx) I made the inference using ultralytics tool and also with the opencv code found in Object Detection using YOLOv5 OpenCV DNN in C++ and Python

With the ultralytics tool, I was able to get 86% confidence, while with the opencv code I got 25%.

Without showing the modified code referenced above as it is from the company I work for, are you able to give me hints to why this is happening? I’m making inference with the yolov5m6 model and 1280x1280 image input size.

Thank you

p.carvalho · August 14, 2023, 9:26am

Hi,

Does anyone have any idea? Did anyone faced the same problem? Please, this is urgent.

Thanks

p.carvalho · August 16, 2023, 4:05pm

I have a new update: It turns out the tool gives 25% confidence if the image is resized to 1280x 1280. Basically this is happening in cv2.dnn.blobFromImage(frame, 1 / 255.0, (1280, 1280), swapRB=True, crop=False) where the image is resized.
Is there any way to change the blobfromimage flags to get the desired confidence?

Thanks

berak · August 17, 2023, 4:57am

to help you, we would need information. code, preprocessing, data. (for both c++ & onnx)

as long as you don’t give us anything — expect no help.

p.carvalho · August 17, 2023, 8:50am

Ok, here is the code:

I get an image from a camera by requests, decode with opencv and then I do the same process as mentioned above in the tutorial of learnopencv. Just to clarify that “for net in range(len(self.net)):” exists because I’m processing the same image for different models, each model one class.

def processa(self,url,username,password,index,sectorXY):
            imagem=requests.get(url, auth=HTTPDigestAuth(username, password))
            frame = cv2.imdecode(np.fromstring(imagem.content, np.uint8), cv2.IMREAD_UNCHANGED)


            tamanho = frame.shape
            altura = frame.shape[0]
            comprimento = frame.shape[1]
            blob = cv2.dnn.blobFromImage(frame, 1 / 255.0, (1280, 1280), swapRB=True, crop=False)
            objetos_capturados_frame = []
            smoke_detections=False
            for net in range(len(self.net)):
              print(net)
              if str(net) not in self.filtros[index]:
                 continue
              layer_names = self.net[net].getLayerNames()
              outputlayers = [layer_names[i-1] for i in self.net[net].getUnconnectedOutLayers()] 
              self.net[net].setInput(blob)
              outputs = self.net[net].forward(outputlayers)
              class_ids = []
              confidences = []  # Grau de confiança sobre a imagem
              caixas = []
              rows = outputs[0].shape[1]
              x_factor = 3840 / 1280
              y_factor =  2160 / 1280
              
              for r in range(rows):
              
                row = outputs[0][0][r]

                confidence = row[4]

            # Discard bad detections and continue.
                if confidence >= (self.confidence/100):

                    classes_scores = row[5:]


                    if classes_scores[0] > 0.5: 

                        cx, cy, w, h = row[0], row[1], row[2], row[3]

                        left = int((cx - w/2) * x_factor)

                        top = int((cy - h/2) * y_factor)

                        width = int(w * x_factor)

                        height = int(h * y_factor)


                        caixas.append([left, top, width, height])
                        confidences.append(float(confidence))
                        class_ids.append(net)

              indexes = cv2.dnn.NMSBoxes(caixas, confidences, 0.5, 0.4)
              if len(indexes) > 0 and net == 0:
                 smoke_detections=True
              for i in indexes:
                objeto_no_frame = {}
                objetos_capturados_frame_aux=[]
                #i = i[0]
                caixa = caixas[i]
                x = caixa[0]
                y = caixa[1]
                w = caixa[2]
                h = caixa[3]
                objeto_no_frame["object_id"] = int(class_ids[i])
                objeto_no_frame["confianca"] = round(confidences[i],2)
                objeto_no_frame["topLeft"] = [x, y]
                objeto_no_frame["bottomRight"] = [w, h]
                
              objetos_capturados_frame.append(objeto_no_frame)

berak · August 17, 2023, 9:17am

looks like you’re only processing the 1st (most coarse pyramid level )
of 3 (?) yolo output layers

that sounds redundant.
wouldn’t you rather train a single model on multiple desired classes ?

this code won’t run from 4.6.0 on (you must be using something outdated !),
please instead use:

outputlayers = self.net[net].getUnconnectedOutLayersNames()

p.carvalho · August 17, 2023, 9:38am

I will try to check the yolo output layers thing. But as I mentioned in the 3rd reply, I also got 25% of confidence in the ultralytics tool. I did the following:

In one test, I did this - python detect.py --weights runs/train/exp/weights/best.onnx --source image.jpg --imgsz 1280 1280 --device 0 - where image.jpg is a 4K image and got 84% of confidence
In another test, I did - python detect.py --weights runs/train/exp/weights/best.onnx --source image_resized.jpg --imgsz 1280 1280 --device 0 - where image_resized.jpg is a 1280x1280 image and got 25% of confidence

In my opinion, I think the cv2.dnn.blobfromimage is resizing the image and the inference is made in the resized image not the original. But the problem is that I exported the onnx model in the 1280x1280 size, so the parameter size in blobfromimage can only be (1280,1280)

berak · August 17, 2023, 9:52am

please update to current (4.8.0)
else all measuring is outdated / irrelevant,

p.carvalho · August 17, 2023, 10:05am

I have the 4.6.0 version. But I don’t think this is a version issue, has yolov5 is from 2020/2021 and my opencv version is from 2022. I mean if there was a problem with a previous version of opencv, yolov5 would never work up until 2023 that is when opencv 4.8.0 was released. There is something else that is making my issue. And I have a feeling it is related to image resizing, but I don’t know how to overpass this.

berak · August 17, 2023, 10:46am

maybe you can export several sizes & check

it should be done per class, like here:

github.com

opencv/opencv/blob/abda7630737fa8307efc88169d3525d3d2005898/samples/dnn/object_detection.py#L183


      
          if len(outNames) > 1 or lastLayer.type == 'Region' and args.backend != cv.dnn.DNN_BACKEND_OPENCV:
              indices = []
              classIds = np.array(classIds)
              boxes = np.array(boxes)
              confidences = np.array(confidences)
              unique_classes = set(classIds)
              for cl in unique_classes:
                  class_indices = np.where(classIds == cl)[0]
                  conf = confidences[class_indices]
                  box  = boxes[class_indices].tolist()
                  nms_indices = cv.dnn.NMSBoxes(box, conf, confThreshold, nmsThreshold)
                  nms_indices = nms_indices[:, 0] if len(nms_indices) else []
                  indices.extend(class_indices[nms_indices])
          else:
              indices = np.arange(0, len(classIds))
          
          for i in indices:
              box = boxes[i]
              left = box[0]
              top = box[1]
              width = box[2]

p.carvalho · August 17, 2023, 11:39am

I found out the issue. You see, if I resize the image with blobfromimage, the data is distorted, but if I use the strategy mentioned in the following link Detecting objects with YOLOv5, OpenCV, Python and C++ | by Luiz doleron | MLearning.ai | Medium I discovered that the confidence passes from 25% to 79% of confidence. The strategy is:

Create a square numpy array with the maximum side of the image (height or width)
Print the image exactly as it was captured (16:9) and print it in the numpy array
This way the numpy array is a square array where the image stays at the top of it
Since the images I’m working are 16:9 and 4K the height is only 2160, so the rest of the height needs to be black (np.zeros).
Now that I have the square image (originial image + black pixels), I can resize it to 1280x1280

The code that helped is the following:

col, row, _ = source.shape
_max = max(col, row)
resized = np.zeros((_max, _max, 3), np.uint8)
resized[0:col, 0:row] = source

I still don’t know why there is still a discrepancy of 5% confidence, but I’m on the right track.
Anyway, thank you for your help. Believe it or not the word you wrote “outdated” helped me a lot as I tried to find a more recent tutorial of the code.

Regards

Topic		Replies	Views
4K processing on Nvidia Jetson Xavier NX Python dnn , nvidia	6	782	May 31, 2023
OpenCV DNN changing my input dimension C++ dnn	10	1018	June 12, 2023
Unexpected behavior with ONNX model when read from OpenCV DNN module (Java) Android/Java dnn , ultralytics	8	335	September 10, 2024
OpenCV C++ and Yolo v5 C++ dnn , object-detection , yolov5	5	12794	January 7, 2022
Using dnn+onnx in opencv4.9 outs error	7	226	April 30, 2024

Confidence level of detection is different in ultralytics tool compared to opencv code

Related topics