I have some concerns regarding a project that I am setting up.
Indeed, when I display a simple Rtsp video stream via OpenCv, I have no problems. Everything is fluid. However I am using an haarcascaded face detection code and I have a lot of latencies and frames loss when i use it in my code. I am looking for some avenues to explore because I can’t find solutions despite my research on the Net. I tried to change my camera but the problem is the same.
Below is the code:
import numpy as np
import cv2
from threading import Thread
class Algo(Thread):
def __init__(self, frame):
Thread.__init__(self)
self.frame = frame
def run(self):
faces = face_cascade.detectMultiScale(gray, 1.3,5)
for (x,y,w,h) in faces:
cv2.rectangle(frame, (x,y), (x+y, y+h), (255,0,0), 2)
roi_gray = gray[y:y+h, x:x+w]
roi_color = frame[y:y+h, x:x+w]
cap = cv2.VideoCapture('rtsp://[username]:[password]@[IP]:554/Streaming/Channels/1/')
face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_alt.xml')
while(True):
# Capture frame-by-frame
cap.grab()
ret, frame = cap.retrieve()
# Our operations on the frame come here
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
thread_1 = Algo(frame)
thread_1.start()
thread_1.join()
# Display the resulting frame
cv2.imshow('frame', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
# When everything done, release the capture
cap.release()
cv2.destroyAllWindows()
then that is the problem. I would assume that the cascade classifier is slowing down your pipeline meaning that VideoCapture() cannot read frames as fast as they are being produced over the network. Because you are streaming live video over rtsp the only option is to drop frames as they cannot be re-requested indefinitely.
You should be able to check this by increasing the wait time, i.e.
cap = cv2.VideoCapture('rtsp://[username]:[password]@[IP]:554/Streaming/Channels/1/')
while(cap.isOpened()):
ret, frame = cap.read()
if not ret: break
cv2.imshow('frame', frame)
if cv2.waitKey(400) & 0xFF == ord('q'):
break
the camera will produce frames at its own constant rate. if you don’t consume them promptly, they queue up. that is the delay you see.
use this. it will always give you the latest frame (but never twice unless you ask for that), and it will drop frames when you aren’t consuming quickly enough.
how large is the image ? running a cascade classifier on a 4k image must be slow, less pixels, faster processing, – try to resize the image to something smaller.
if you absolutely have to use cascades, at least use proper minSize, maxSize arguments, so it will drop a couple of (unneeded) image pyramids
don’t use cascades. there are a lot of faster alternatives for face detection, like pico, or opencv’s dnn based object detection(yes, there’s a network for that).
throw out the Thread. as it is now, it’s not doing anything useful. naive attempts at multithreading can only do harm
@cudawarped i change my code but i don’t see any changes.
crackwitz The problem is that i need to use and treat all of my frames because the camera need to go on a car.
Basically i have a code using dnn for anonymize faces. But i have the same problem. I thought that it was an hardware problem so i decided to use a basic haarcascade of face detection for see if the problem was the same.
I specify that when i use a webcam as @cudawarped said, it works well. all frames a used and i have a perfect result.
import cv2
import numpy as np
import time
import imutils
prototxt_path = "weights/deploy.prototxt.txt"
model_path = "weights/res10_300x300_ssd_iter_140000_fp16.caffemodel"
# load Caffe model
model = cv2.dnn.readNetFromCaffe(prototxt_path, model_path)
cap =
cv2.VideoCapture('rtsp://[user]:[password]@[ip]:554/Streaming/Channels/1/')
cap.set(cv2.CAP_PROP_FPS, 5.0)
#cap[1].set(cv2.CAP_PROP_FPS, 15.0)
while True:
start = time.time()
_, image = cap.read()
# get width and height of the image
h, w = image.shape[:2]
kernel_width = (w // 7) | 1
kernel_height = (h // 7) | 1
#preprocess the image: resize and performs mean subtraction
blob = cv2.dnn.blobFromImage(image, 1.0, (300, 300), (104.0, 177.0, 123.0))
# set the image into the input of the neural network
model.setInput(blob)
# perform inference and get the result
output = np.squeeze(model.forward())
for i in range(0, output.shape[0]):
confidence = output[i, 2]
# get the confidence
# if confidence is above 40%, then blur the bounding box (face)
if confidence > 0.4:
# get the surrounding box cordinates and upscale them to original image
box = output[i, 3:7] * np.array([w, h, w, h])
# convert to integers
start_x, start_y, end_x, end_y = box.astype(np.int)
# get the face image
face = image[start_y: end_y, start_x: end_x]
# apply gaussian blur to this face
face = cv2.GaussianBlur(face, (kernel_width, kernel_height), 0)
# put the blurred face into the original image
image[start_y: end_y, start_x: end_x] = face
cv2.imshow("image", image)
if cv2.waitKey(1) == ord("q"):
break
time_elapsed = time.time() - start
fps = 1 / time_elapsed
print("FPS:", fps)
cv2.destroyAllWindows()
cap.release()
As you can see in the following picture i took a screen at the same time between the url of the cam and with my code :
class Algo(Thread):
def __init__(self, frame):
Thread.__init__(self)
self.frame = frame
to:
class Algo(Thread):
def __init__(self):
Thread.__init__(self)
Then apply from @ cudawarped's example.
Then call this:
thread_1 = Algo()
thread_1.start()
In second code. Put this outside of while condition block.
# get width and height of the image
h, w = image.shape[:2]
kernel_width = (w // 7) | 1
kernel_height = (h // 7) | 1
#preprocess the image: resize and performs mean subtraction
blob = cv2.dnn.blobFromImage(image, 1.0, (300, 300), (104.0, 177.0, 123.0))
# set the image into the input of the neural network
model.setInput(blob)
# perform inference and get the result
output = np.squeeze(model.forward())
I’m sorry if i don’t understand but if i do it i need to define ‘image’ before ?
And if i do it the program will treat just one frame ?
Because i test it and my window capture just one picture.
Are you running the webcam footage through the dnn/cascade classifier?
If so how do you know you are not missing any frames? I suspect you are and its not as obvious because the webcam won’t be streaming (where each frame depends on information from previous frames), it will most likely be MJPEG or similar where each frame is encoded separately.
Because i have any problems with my webcam. The capture is fluid and anonymize correctly.
with exactly the same program i have any problem with a webcam.
So when my programm do exactly what i want i can tell that i have no problem with it i guess.