Are there some patches to improve the processing speed of net.forward() for the ARM GPU(Mali-T864)?

■problem description
In the test code of the OpenCV(dnn), I set the backend as the ARM GPU(Mali-T864) refer to the following code.
net.setPreferableBackend(DNN_BACKEND_OpenCV);
net.setPreferableTarget(DNN_TARGET_OPENCL);

But refer to the test result, the processing speed of net.forward() by the ARM GPU(Mali-T864) is not fast.
the processing speed of the ARM GPU(Mali-T864) is the similar to the processing speed of one core of the CPU(A53).

I doubt that the OpenCV maybe did not follow the following rule(Optimizing OpenCL for Mali GPUs) very well for the ARM GPU(Mali GPUs).
Could you kindly please tell me are there some patches to improve the processing speed of net.forward() for the ARM GPU(Mali-T864)?
For example, enable ACL(Arm Compute Libary) in the Tengine of the OpenCV.

https://developer.arm.com/documentation/100614/0313/optimizing-opencl-for-mali-gpus?lang=en

Thanks in advance.

■system information
・OpenCV
4.5.4
・Platform
Orange Pi 4
・CPU
Rockchip RK3399(ARM):Dual-core Cortex-A72+Quad-core Cortex-A53
・GPU
Mali-T864(ARM)
・system
ubuntu 18.04
・uname -a
Linux orangepi4 4.4.179-rk3399 #4 SMP Tue Nov 9 11:24:14 JST 2021 aarch64 aarch64 aarch64 GNU/Linux