I have a WSL2 Ubuntu20.04 with OpenCV 4.6.0 and CUDA 11.8. My inference performance on that machine is 25 frames.
Now, I’ve a docker image with has the same libraries and OpenCV have been compiled the same way, however performance is 15 frames.
I did check all the libraries that my application is linked with and they are the same in terms of versions. I’ve check OpenCV compiled options, the are the same. What can I do to track the issue here? Any advices?
I know for sure that this is not the docker issue, because I’ve observed similar situation on my Windows machine. I’ve compiled OpenCV with GPU support with Python bindings twice, and in one Python venv I am getting 25 frames, and in the other 15 frames. I’ve definitely had to do something different, but what?
The only difference is in performance, detected objects and probabilities are exactly the same, on each platform.
Hi @berak , thank you for your interest. So, I was looking for a way to profile my application.. Firstly I’ve profiled it on both systems using GNU gprof. Results did not show anything unusual, but gprof only allowed me to see my own functions as OpenCV was not built with option to emit profiling information. It could probably be built like that but further googling led me to Profiling OpenCV Applications · opencv/opencv Wiki · GitHub. Using their instructions I was able to collect profiling data.
What we see above is the 10 most time consuming functions executed within realms of OpenCV. They are sorted in descending order. So the first one is the most time consuming.
As you can see, on both systems executing of application was exactly the same (you can tell that by looking at count column which says how many times given function was called).
However, there is a difference how long cv::dnn::dnn4_v20220524::Net::forwardexecuted. On System 1 average time for one execution was 43 milliseconds, on System 2 it was 67 milliseconds.
So now I know that the problem lies within cv::dnn::dnn4_v20220524::Net::forward, somehow it behaves differently on both systems. I also know that this only happens on CUDA backend (cv::dnn::DNN_BACKEND_CUDA).
My next idea is to profile application using NVIDIA Nsight Compute maybe this will give me a hint what’s going on.
Similarities:
Ubuntu 20.04
CUDA 11.8 CuDNN 8.7
OpenCV 4.6.0 built with the same options (I diffed cv::getBuildInformation())
Application linked with the same libraries (I diffed ldd output).
Differences
System 1 is WSL2 image, System 2 is Docker image.
System 1 has much more libraries pre-installed.
OpenCV built libraries are different at binary level (different size).
I suspect that CMake script must have picked up something different without being explicit what it was.
I can reproduce the issue every time. I’ve built multiple docker images from scratch and used multiple WSL2 instances and I got the same results every time.
Docker on windows starts a virtual machine just like in MacOS. Hence the delays.
WLS is not a virtualization and it runs faster.
Now, I know that you claim it to have worked on windows, but this is my humble opinion.