compiled with O3. use -neon or -vfpv3. seems no big difference. run 20 frames took less than 20s. each frame 352*352 around 1.8s for yolo fatestv2, model size 300KB.
with opencv macro switch use_intrinsic and compile_neon. looks like cv_try_neon turned on and some functions are not implemented, thus failed to pass compilation.
Plan next step to optimise some parts by using FPGA.
Anyway opencv 4.11.0 is fine to use in ARM baremetal environment.