Hi.
I play around with the OpenCV dnn module on both CPU and GPU on Jetson Nano. I make a very similar post on the Nvidia forum Poor performance of CUDA GPU, using OpenCV DNN module - Jetson Nano - NVIDIA Developer Forums, but I think that the topic is more related to OpenCV than CUDA.
I made some tests using different super-resolution models. The results are as follow:
- EDSR x2: CPU: timeout, GPU: timeout.
- ESPCN x4: CPU: 0.17469215393066406 s, GPU: 10.169917821884155 s
- FSRCNN x4: CPU: 0.12776947021484375 ,GPU: 5.2502007484436035
- LapSRN x4: CPU: 8.098081111907959, GPU: 6.410776138305664
One can see that CPU time execution is much smaller than on GPU - which is contrary to logic. I check the resources load during the execution of both CPU and GPU versions and in the first case the CPU load was 100%, but with GPU version the load was at about 20%. It looks like dnn module doesn’t use the full power of GPU.
Why is the performance on GPU is too poor to CPU?
Is there exist a way to decrease the execution time of the GPU version?