Hi there,
I’m building a small project around opencv in Python. The goal is to do some exploration around computer vision inside a custom made python library. I have also built a small GUI that use that library to explore the computer vision subject.
I’m trying to implement the YuNet face detector. It works fine, but the issue is looks to be run on CPU and not on GPU.
I’m using the opencv transparent API, which means that I’m receiving an UMat frame and doing most of the processing using the UMat object which is supposed to be run on the GPU using the opencl bindings.
But the examples I found on the internet does not seems to use the UMat but the regular ndarray (MatLike) object. Which means in the code in the self.detecor.detect
, it returns an UMat shape that needs to be converted back to MatLike faces.get()
since we cannot loop over an UMat object.
Here’s the code for information:
class YUNetFaceDetectionFilter(ImageProcessingDecorator):
"""A class representing a YUnet DNN face detection filter for image processing."""
def __init__(self, wrapped: ImageProcessingStrategy) -> None:
"""Initialize the YUNetFaceDetectionFilter.
Args:
wrapped (ImageProcessingStrategy): The wrapped image processing strategy.
"""
super().__init__(wrapped)
self.detector = cv2.FaceDetectorYN.create(
"data/face_detection_yunet_2023mar.onnx", "", (0, 0)
)
def process(self, _frame: Image) -> UMat:
"""Process an image.
Args:
frame (UMat): The image to process.
Returns:
UMat: The processed image.
"""
# face_detection_yunet_2023mar.onnx
frame = super().process(_frame)
frame_mat = frame.get()
heigh, width, _ = frame_mat.shape
self.detector.setInputSize((width, heigh))
_, faces = self.detector.detect(frame)
if faces is None: # type: ignore
return frame
try:
for face in faces.get():
# bounding box
box = list(map(int, face[:4]))
color = (0, 255, 0)
cv2.rectangle(frame, box, color, 2)
# confidence
confidence = face[-1]
confidence = "{:.2f}".format(confidence)
position = (box[0], box[1] - 10)
cv2.putText(
frame,
confidence,
position,
cv2.FONT_HERSHEY_SIMPLEX,
0.5,
color,
1,
cv2.LINE_AA,
)
except TypeError:
pass
return frame
The class is used in a layered system where we can apply mulitple filters on a frame. But here for instance, the YUNetFaceDetectionFilter
is used alone without any other filters.
And the actual CPU usage:
This gives me an actual frame rate of ~= 15 FPS
For comparison, I implemented the Haar Cascade Detection:
class HaarCascadeFaceDetectionFilter(ImageProcessingDecorator):
"""A class representing a Haar cascade face detection filter for image processing."""
def __init__(self, wrapped: ImageProcessingStrategy) -> None:
"""Initialize the HaarCascadeFaceDetectionFilter.
Args:
wrapped (ImageProcessingStrategy): The wrapped image processing strategy.
"""
super().__init__(wrapped)
self.face_cascade = cv2.CascadeClassifier(
"data/lbpcascade_frontalface.xml" # type: ignore
)
def process(self, _frame: Image) -> UMat:
"""Process an image.
Args:
frame (UMat): The image to process.
Returns:
UMat: The processed image.
"""
frame = super().process(_frame)
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
faces = self.face_cascade.detectMultiScale(gray, 1.3, 5)
for x, y, w, h in faces:
cv2.rectangle(frame, (x, y), (x + w, y + h), (255, 0, 0), 2)
return frame
And there the GPU is much more used while the CPU usage remains low and the actual frame rate is 30 FPS, which is the maximum for my webcam.
I suspect the bottleneck is somewhere in the following code:
frame_mat = frame.get()
heigh, width, _ = frame_mat.shape
self.detector.setInputSize((width, heigh))
_, faces = self.detector.detect(frame)
if faces is None: # type: ignore
return frame
try:
for face in faces.get():
But most of the example I found does not seems to use the transparent API.
If anybody has an idea, I will be glad
Thank you very much.