OpenCV's T-API facedetect.cpp sample is hella slow and CPU intensive with OpenCL "on"

Hello,

The “ufacedetect.cpp” from T-API folder in samples folder runs super slow after compilation. When running the compiled code, OpenCL can be seen displayed as “ON” while running the compiled code. There is no difference in performance, OpenCL does not offload CPU usage.

System information (version)
opencv-4.5.5_9
OS: FreeBSD 13.1-RELEASE-p1 amd64
Resolution: 3840x2160
DE: Plasma 5.24.6
WM: KWin
Theme: [Plasma], Breeze [GTK2/3]
Icons: [Plasma], breeze-dark [GTK2/3]
Terminal: konsole
CPU: AMD FX-8350 (8) @ 3.991GHz
GPU: Ellesmere [Radeon RX 580]
Memory: 11708MiB / 32684MiB
Compiler => FreeBSD clang version 13.0.0 (git@github.com:llvm/llvm-project.git llvmorg-13.0.0-0-gd7b669b3a303)

Here is the output for clinfo:

here is the output for opencv_version --opencl:

 4.5.5
OpenCL Platforms: 
    Clover
        dGPU: AMD Radeon RX 580 Series (POLARIS10, DRM 3.35.0, 13.1-RELEASE-p1, LLVM 13.0.1) (OpenCL 1.1 Mesa 21.3.8)
    Portable Computing Language
        CPU: AMD FX(tm)-8350 Eight-Core Processor            (OpenCL 1.2 pocl HSTR: pthread-x86_64-portbld-freebsd13.1-bdver2)
Current OpenCL device: 
    Type = dGPU
    Name = AMD Radeon RX 580 Series (POLARIS10, DRM 3.35.0, 13.1-RELEASE-p1, LLVM 13.0.1)
    Version = OpenCL 1.1 Mesa 21.3.8
    Driver version = 21.3.8
    Address bits = 64
    Compute units = 36
    Max work group size = 256
    Local memory size = 32 KB
    Max memory allocation size = 3 GB 204 MB 819 KB 204 B
    Double support = Yes
    Half support = Yes
    Host unified memory = No
    Device extensions:
        cl_khr_byte_addressable_store
        cl_khr_global_int32_base_atomics
        cl_khr_global_int32_extended_atomics
        cl_khr_local_int32_base_atomics
        cl_khr_local_int32_extended_atomics
        cl_khr_int64_base_atomics
        cl_khr_int64_extended_atomics
        cl_khr_fp64
        cl_khr_extended_versioning
    Has AMD Blas = No
    Has AMD Fft = No
    Preferred vector width char = 16
    Preferred vector width short = 8
    Preferred vector width int = 4
    Preferred vector width long = 2
    Preferred vector width float = 4
    Preferred vector width double = 2
    Preferred vector width half = 0

Thanks for any help.

Your GPU supports OpenCL 1.1 but afaik OpenCV requires OpenCL 1.2

Hello, thanks for the reply.

OpenCL versions are backwards compatible, need to add a header for which supports it. Haven’t tried this, but it can work.

As far as face detection, I’m using a different much better algo. Such as “libfacedetect”, but using YOLO 4 to detect a human and also face then feed the image to libfacedetect which would do additional face detection.

YOLO 4 is running on AMD GPU RX 580 via OpenCL 1.1 (Clover MESA), using a GitHUB for that:

AMD has a OpenCL runtime through their ROCm’s software stack.

ROCm’s OpenCL

You can try with that, but not sure if that can impact your work.

I had a similar encounter with OpenCL on Radeon GPU with MXnet
but it was solved after I switched to ROCm compiled OpenCL.

1 Like

Thanks for the reply.

I’m well aware of AMD’s new ROCm stack which provide’s it official OpenCL implementation. I would use ROCm first, however it only works for Linux.

I use Unix (FreeBSD) and there are no port or support at all for ROCm on FreeBSD.

Therefore I have no idea how to get OpenCL to work on FreeBSD for OpenCV since there are no ROCm implementation for FreeBSD.

I know YOLO 7 can work with out ROCm using “ncnn” from GitHub, I ask them if I can run it on AMD GPUs and they said yes it could work since they use Vulcan API to do the hardware acceleration.

As for OpenCV using OpenCL for AMD GPUs for Unix OS, I’m not too sure how it could be done, one possibility is to use OpenCL header files which are backwards compatible from OpenCL 1.2 to 1.1.

Thanks.