How to detect objects with Cpp and DNN, CUDA

What I want to do:

I would like to write a program that is detecting objects in real-time (There will be more features in the future, so I really hope to write smth that I could modify and add to). I’m just starting with Computer Vision, I’m C++ developer and have some experience with OpenCV - that’s why I would prefer to use OpenCV and that language for that. I aim at performance and speed with stable video input and I read that DNN models run the best in that regard. I looked around the internet and tried to install everything (CUDA, cuDNN, OpenCV, Yolo), I successfully run some examples from yolo but I have no idea how to implement that to my cpp code. I looked around for some step-by-step guide, but still, without success - link errors in OpenCV, VisualStudio doesn’t see yolo, code examples don’t run, my IDE doesn’t see OpenCV define, builds are extra slow, and even occasional bluescreens! I’m sure I did something wrong but doesn’t know what, because to start anything I needed to reach different sources that ware not necessarily compatible :confused:

As my first milestone, I want to just run some code examples that detect faces on my webcam. I hope that after achieving that I would have an easier time with anything more.


I’m working on Windows + Visual Studio

What I’m looking for here:

So I’m here to ask, for some help with research. Maybe someone knows what step should I take or what phases should I look for to successfully do that or at last help me achieve my first milestone?

one problem at a time, please.

which opencv version ? what did you install ? did you try to build it locally ?

opencv’s dnn can read yolo networks, so your 1st attempt might be to get the sample code running with a yolo network

(no you don’t need to build darknet for this at all)

once you have that, you could try to rebuild the opencv libs with contrib modules / cuda


I use OpenCV 4.5.2, it’s seemed that I built something wrong. After some tweaks I finally succeeded (at last I think so cause all examples I tried somehow run).

I also finally succeded in running Opencv dnn example - object_detection, using:

$ example_dnn_object_detection --config=yolov3.cfg --model=yolov3.weights --width=416 --height=416 --scale=0.00392 --target=1

It indeed recognizes objects, but it’s working terribly slow - around 1,5 FPS, how can I improve that?

I already built it with CUDA.
I tried to run modules/cudaobjectdetect.sln but while building I get

gl_core_3_1.obj : error LNK2019: unresolved external symbol __imp_wglGetProcAddress referenced in function "void * __cdecl IntGetProcAddress(char const *)" (?IntGetProcAddress@@YAPEAXPEBD@Z)
1>opengl.obj : error LNK2019: unresolved external symbol __imp_wglGetCurrentContext referenced in function "class cv::ocl::Context & __cdecl cv::ogl::ocl::initializeContextFromGL(void)" (?initializeContextFromGL@ocl@ogl@cv@@YAAEAVContext@13@XZ)
1>opengl.obj : error LNK2019: unresolved external symbol __imp_wglGetCurrentDC referenced in function "class cv::ocl::Context & __cdecl cv::ogl::ocl::initializeContextFromGL(void)" (?initializeContextFromGL@ocl@ogl@cv@@YAAEAVContext@13@XZ)
1>C:\Program Files\OpenCV\build\bin\Release\opencv_core452.dll : fatal error LNK1120: 3 unresolved externals
1>Done building project "opencv_core.vcxproj" -- FAILED.

How can I fix it?
I built with OpenGL as well.

Exact hardware specs please.

OpenCV can use various backends and devices. Check that it executes DNN on the GPU, not the CPU


Here :
CPU - i5-4460 3.20GHz
GPU - Nvidia GeForce GTX 970
Windows 10 Pro 64x

DNN example use like 35% GPU and CPU, on diffrent --targets it use CPU only or do not start at all

there are also faster and smaller yolo networks

a GTX 970 has ~4 Tflop/s of conventional FP32, which is about half of what an RTX 2070 can do, but that ignores tensor cores. the 20 series has tensor cores, the 9 series does not. tensor cores are the performance factor that accelerate convolutional layers by at least an order of magnitude.

do follow that link on mobile/tiny variants of these networks.

make sure to pick the CUDA backend, not the generic OpenCL one.

someone named Yashas Samaga implemented the CUDA backend for the dnn module. he can also be seen battling with Adrian Rosebrock over proper benchmarking methodology on Adrian’s notorious code blog. I won’t link to that but suffice it to say, I chose my words carefully.