Help needed on understanding why video from webcam is lagging when using thread and queue

I am doing a learning experiment: using a worker thread to do the frame capturing from web cam, then send the frames to the main thread to do a processing which is forwarding to a face detection model (yunet). I send the frames via queue.

The code is as follow:

class Camera(Thread):
    def __init__(self, cap, frame_queue):
        self.frame_queue = frame_queue
        self.cap = cap
        self.running = True

    def run(self):
        while self.running:
            has_frame,frame =
            # if not has_frame:
            #     self.running = False
            #     break

    def stop(self):
        self.running = False

def visualize(image, faces, print_flag=False):
    output = image.copy()

    # cv2.putText(output, 'FPS: {:.2f}'.format(fps), (30, 30), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255))

    for idx, face in enumerate(faces):
        if print_flag:
            print('Face {}, top-left coordinates: ({:.0f}, {:.0f}), box width: {:.0f}, box height {:.0f}, score: {:.2f}'.format(idx, face[0], face[1], face[2], face[3], face[-1]))

        coords = face[:-1].astype(np.int32)
        # Draw face bounding box
        cv2.rectangle(output, (coords[0], coords[1]), (coords[0]+coords[2], coords[1]+coords[3]), (0, 255, 0), 2)
        # Draw landmarks, (coords[4], coords[5]), 2, (0, 0, 0), 2), (coords[6], coords[7]), 2, (0, 0, 0), 2), (coords[8], coords[9]), 2, (0, 255, 0), 2), (coords[10], coords[11]), 2, (255, 0, 255), 2), (coords[12], coords[13]), 2, (0, 255, 255), 2)
        #check tilt
        # angle = int(check_tilt(coords[4],coords[5],coords[6],coords[7])[0])
        # print(angle)
        # Put score
        cv2.putText(output, '{:.4f}'.format(face[-1]), (coords[0], coords[1]+15), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0))
        # cv2.putText(output, str(angle), (coords[0], coords[1]+40),cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0))

def main():

    yunet = cv.FaceDetectorYN.create(
        model= "face_detection_yunet_2022mar.onnx",
        input_size=(320, 320),

    device_id = 0
    cap = cv.VideoCapture(device_id)
    frame_w = int(cap.get(cv.CAP_PROP_FRAME_WIDTH))
    frame_h = int(cap.get(cv.CAP_PROP_FRAME_HEIGHT))
    yunet.setInputSize([frame_w, frame_h])

    frame_queue = Queue()

    camera_thread = Camera(cap, frame_queue)
    # processing_thread = Detector(frame_queue, yunet)
    # processing_thread.start()

    tm = cv.TickMeter()
    counter = 0
    while cv.waitKey(1) < 0:


        if frame_queue.empty():

        frame = frame_queue.get_nowait()

        if frame is None:

        # has_frame, frame =
        # if not has_frame:
        #     print('No frames grabbed!')

        _, faces = yunet.detect(frame) # # faces: None, or nx15 np.array
        counter +=1
        # print(faces)
        fps = tm.getFPS()


        if faces is not None:
            frame = visualize(frame, faces, fps=fps)
        cv.imshow('libfacedetection demo', frame)


if __name__ == '__main__':

And the problem is the showed video is lagging behind from what is happening in realtime, even though the fps is around 22ms. I think the Camera thread is slower than the main thread. Can someone point out what mistake that i have done? Thank you

opencv 4.7.0
python 3.10.9
mac M1

I didn’t examine the code closely, but a few comments:

  1. When I’ve used Python (not recently), I have found threading to be lacking. Maybe it has gotten better, but for high-performance multi-threading you might want to consider a different language.
  2. You say your fps is around 22ms. Do you mean you get a new frame every 22ms (about 45 FPS) or that it is 22 FPS?
  3. In either case, is your processing thread able to process frames at the same rate you can generate them? If not, you might be building up a queue of unprocessed images, which could lead to a lot of latency. If you are constantly trying to read and queue frames, and your processing loop takes longer than it takes for the camera to provide an image, you will end up with a queue which continues to grow.
  4. I haven’t had good performance using python / cv.imshow in the past. It could be related to the way python does threading, but I wasn’t able to get it to work well for me.

I would suggest a few things:

  1. time your processing thread. Does it take more time to process an image than it does for the camera to generate an image? If so your image queue will grow forever, with increasing latency the longer your program runs.

  2. When you grab a new image check the queue - if there is an image in the queue waiting to be processed, remove and discard it, then put the newly acquired image in the queue. This way when your processing thread fetches the next image from the queue it will be a fresh image. This would take some thread synchronization to work properly (mutex protect the queue).

  3. Why do you need a separate thread to read the image in the first place? Just have your processing thread read the image directly from the camera. That way you don’t have to mess with a queue, and whatever you retrieve from the camera will be a fresh image.

