given these numbers, maybe not an “interesting” direction (although interesting work) but the execution times are valuable information to any engineer. I also believe using SIMD/NEON would help a lot. ST have demoed some simple neural networks running on audio input in realtime and IIRC even some video. I’ve heard of OpenVX, which may assume less than 128 bit SIMD intrinsics.
Yes, I also think that SIMD can improve the performance. We have tried on STM32H7 it decreased time by 25%. It is of course noticeable slower than on big ARMs. However I still think that OpenCV on low power consumption can be great for some tasks. Since OpenCV can be better for them than neural networks.
Thanks for links especially about OpenVX