Net.forward( ) is taking much time to compute while detecting faces

Hi…I am trying to find faces of human using cfg(yolov3-face.cfg) and weight files(yolov3-wider_16000.weights) in realtime. I can get approximately 5 FPS in CPU. But it is not sufficient for my application. I noticed that single frame execution takes 200 ms. Out of it nearly 170ms spent for net.forward() line to execute. How to reduce the time taken for this line. Accuracy is good. I have attached part of the code below. Anyone please help me to come out from this issue.

while True:
ret, frame =
[R, C, D] = frame.shape
blob = cv2.dnn.blobFromImage(frame, 1 / 255, (IMG_WIDTH, IMG_HEIGHT), [0, 0, 0], 1, crop = False)
#start = timer()
outs = net.forward(layer_names) #this is taking around 170ms
#print(“loop time: %.5f” % (timer() - start))

Hi .
You should build your OpenCV with CUDA support . Then you will gain a huge prformance from your application. You can refer to this link : Build OpenCV with CUDA
An other option is using OpenVINO as your inference backend. You will gain a good performance with OpenVINO , but not as much as CUDA.
You can refer to this link for OpenCV and OpenVINO configuration :OpenCV with OpenVINO

thanks for the reply @naser_piltan , I am using windows 10 which has nvidia GPU support. Is it possible to compile opencv with cuda?

I already tried installing opencv with cuda. But I cant…My CPU has cuda 10.2 and cudnn 7.6.05

@Maheswari.R Yes , sure . You can do it easily. I haven’t try it on windows, but there are much topics on the internet that you can use them.
For example : link1 and link2 and link3

Thanks @naser_piltan. I will try it…

Do you compile your code as debug profile, or release profile?
Both the OpenCV and other libraries ( like Dlib ) have awful performance under debug.
If this is the case, try to build in release profile, and check performance.

I compiled in release mode. Still problem exists.

Did you compile with CUDA? If not what do you mean by

I already tried installing opencv with cuda. But I cant…

Do you have a GPU? If not then your CPU times look consistent with those reported by the author.
Which OS are you using?

I tried compiling opencv with cuda. But i can’t do it successfully due to errors. I am using windows 10. And i have GPU Geforce GTX 1650

CUDA Version: 10.2


What is the problem ? Can you give us more information ?

If you downgrade your version of cuDNN you could try using a pre-compiled version from here.