Lot of Delay with my RTSP cam with OpenCV on Python

Good evening everyone.

I have some concerns regarding a project that I am setting up.

Indeed, when I display a simple Rtsp video stream via OpenCv, I have no problems. Everything is fluid. However I am using an haarcascaded face detection code and I have a lot of latencies and frames loss when i use it in my code. I am looking for some avenues to explore because I can’t find solutions despite my research on the Net. I tried to change my camera but the problem is the same.

Below is the code:

import numpy as np
import cv2
from threading import Thread

class Algo(Thread):
    def __init__(self, frame):
    Thread.__init__(self)
    self.frame = frame

    def run(self):
        faces = face_cascade.detectMultiScale(gray, 1.3,5)

        for (x,y,w,h) in faces:
            cv2.rectangle(frame, (x,y), (x+y, y+h), (255,0,0), 2)
            roi_gray = gray[y:y+h, x:x+w]
            roi_color = frame[y:y+h, x:x+w] 
cap = cv2.VideoCapture('rtsp://[username]:[password]@[IP]:554/Streaming/Channels/1/')

face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_alt.xml')

while(True):
# Capture frame-by-frame
cap.grab()
ret, frame = cap.retrieve()

# Our operations on the frame come here
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

thread_1 = Algo(frame)
thread_1.start()
thread_1.join()

# Display the resulting frame
cv2.imshow('frame', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
    break

# When everything done, release the capture
cap.release()
cv2.destroyAllWindows()

Regards, Pinigseu

I’m not sure exactly what your code is doing (I’m assuming the logical indentation). Why are you launching a separate thread on each loop iteration?

Either way if it works without

faces = face_cascade.detectMultiScale(gray, 1.3,5)

then that is the problem. I would assume that the cascade classifier is slowing down your pipeline meaning that VideoCapture() cannot read frames as fast as they are being produced over the network. Because you are streaming live video over rtsp the only option is to drop frames as they cannot be re-requested indefinitely.

You should be able to check this by increasing the wait time, i.e.

cap = cv2.VideoCapture('rtsp://[username]:[password]@[IP]:554/Streaming/Channels/1/')
while(cap.isOpened()):
    ret, frame = cap.read()
    if not ret: break
    cv2.imshow('frame', frame)
    if cv2.waitKey(400) & 0xFF == ord('q'):
        break

the camera will produce frames at its own constant rate. if you don’t consume them promptly, they queue up. that is the delay you see.

use this. it will always give you the latest frame (but never twice unless you ask for that), and it will drop frames when you aren’t consuming quickly enough.

1 Like

some ideas / options:

  • how large is the image ? running a cascade classifier on a 4k image must be slow, less pixels, faster processing, – try to resize the image to something smaller.
  • if you absolutely have to use cascades, at least use proper minSize, maxSize arguments, so it will drop a couple of (unneeded) image pyramids
  • don’t use cascades. there are a lot of faster alternatives for face detection, like pico, or opencv’s dnn based object detection(yes, there’s a network for that).
  • throw out the Thread. as it is now, it’s not doing anything useful. naive attempts at multithreading can only do harm

Thanks a lot for your answer.

@cudawarped i change my code but i don’t see any changes.

crackwitz The problem is that i need to use and treat all of my frames because the camera need to go on a car.
Basically i have a code using dnn for anonymize faces. But i have the same problem. I thought that it was an hardware problem so i decided to use a basic haarcascade of face detection for see if the problem was the same.
I specify that when i use a webcam as @cudawarped said, it works well. all frames a used and i have a perfect result.

@berak

  • i tried to resize my image but it’s the same issue
  • the problem is that when i’m using a dnn it’s the same issue
  • I tried with a thread because i’ve seen that threading could help me in this type of case so i did it and had the same problem.

Finally, my project need to use cameras that will anonymize faces on the street.

I know that i am a newbie… I’m a system administrator and i know that i have a lot of process to learn.

the problem is that when i’m using a dnn it’s the same issue

proof, please (i have a hard time believing anything you say)

I can give you the Link of the github files where i found the dnn :

caffemodel

prototxt

Below the code :

import cv2
import numpy as np
import time
import imutils

prototxt_path = "weights/deploy.prototxt.txt"

model_path = "weights/res10_300x300_ssd_iter_140000_fp16.caffemodel"

# load Caffe model
model = cv2.dnn.readNetFromCaffe(prototxt_path, model_path)

cap = 
cv2.VideoCapture('rtsp://[user]:[password]@[ip]:554/Streaming/Channels/1/')
cap.set(cv2.CAP_PROP_FPS, 5.0)
#cap[1].set(cv2.CAP_PROP_FPS, 15.0) 


while True:
    start = time.time()
    _, image = cap.read()
    # get width and height of the image
    h, w = image.shape[:2]
    kernel_width = (w // 7) | 1
    kernel_height = (h // 7) | 1
    #preprocess the image: resize and performs mean subtraction
    blob = cv2.dnn.blobFromImage(image, 1.0, (300, 300), (104.0, 177.0, 123.0))
    # set the image into the input of the neural network
    model.setInput(blob)
    # perform inference and get the result
    output = np.squeeze(model.forward())

    for i in range(0, output.shape[0]):
        confidence = output[i, 2]
        # get the confidence
        # if confidence is above 40%, then blur the bounding box (face)
        if confidence > 0.4:
            # get the surrounding box cordinates and upscale them to original image
            box = output[i, 3:7] * np.array([w, h, w, h])
            # convert to integers
            start_x, start_y, end_x, end_y = box.astype(np.int)
            # get the face image
            face = image[start_y: end_y, start_x: end_x]
            # apply gaussian blur to this face
            face = cv2.GaussianBlur(face, (kernel_width, kernel_height), 0)
            # put the blurred face into the original image
            image[start_y: end_y, start_x: end_x] = face
    cv2.imshow("image", image)
    if cv2.waitKey(1) == ord("q"):
        break


    time_elapsed = time.time() - start
    fps = 1 / time_elapsed
    print("FPS:", fps)


cv2.destroyAllWindows()
cap.release()

As you can see in the following picture i took a screen at the same time between the url of the cam and with my code :

when i wait a bit i can see the delay increase.

Sometimes i got this following error too :

image

@ Pinigseu. Change this:

class Algo(Thread):
    def __init__(self, frame):
    Thread.__init__(self)
    self.frame = frame

to:
class Algo(Thread):
     def __init__(self):
     Thread.__init__(self)


Then apply from @ cudawarped's example.

Then call this:
thread_1 = Algo()
thread_1.start()

Or.

class Algo(Thread):
     def run(self):
: 
: 
:
Algo().start

In second code. Put this outside of while condition block.

    # get width and height of the image
    h, w = image.shape[:2]
    kernel_width = (w // 7) | 1
    kernel_height = (h // 7) | 1
    #preprocess the image: resize and performs mean subtraction
    blob = cv2.dnn.blobFromImage(image, 1.0, (300, 300), (104.0, 177.0, 123.0))
    # set the image into the input of the neural network
    model.setInput(blob)
    # perform inference and get the result
    output = np.squeeze(model.forward())

I’m sorry if i don’t understand but if i do it i need to define ‘image’ before ?
And if i do it the program will treat just one frame ?
Because i test it and my window capture just one picture.

Let talk about rtsp. Did u get it working?

Yeah sure. without Anonymizing program it work fine.
But when i put the program on my code i can see a lot of delay and the fps is really low too.

the processing of your neural network takes too much time. you can’t process all the frames in real-time.

the only way to fix this is use a cheaper neural network, speed up computation some other way, reduce frame rate, or discard frames.

Oh… i see… But why i have no problem with a basic webcam ? It’s about the encoding adding to the program ?

Are you running the webcam footage through the dnn/cascade classifier?

If so how do you know you are not missing any frames? I suspect you are and its not as obvious because the webcam won’t be streaming (where each frame depends on information from previous frames), it will most likely be MJPEG or similar where each frame is encoded separately.

Because i have any problems with my webcam. The capture is fluid and anonymize correctly.
with exactly the same program i have any problem with a webcam.
So when my programm do exactly what i want i can tell that i have no problem with it i guess.

please state the resolution and frame rate of your webcam.

please state the resolution and frame rate of your internet stream.

In addition to the source fps can you also state the fps your intuition is telling you you actually get

on both your web and ip cam?