Use universal SIMD intrinsics in my application and set AVX2 or AVX512

I am a fan of OpenCV’s universal intrinsics and would like to use them in my application that uses OpenCV. Unfortunately, OpenCV restricts the use of its SIMD types such as cv::v_float32 to the size of the SIMD registers that were requested when OpenCV was built. For instance, if you build OpenCV by default (such as with vcpkg) then you get SSE3 with 128 bit registers. But I want to write my own code using cv::v_float32 with 512 bit registers (AVX512) and I don’t really care what the OpenCV library was using when it was compiled.

Previously I have used CMake’s CPU_BASELINE=“AVX2” when compiling OpenCV to set the width of the SIMD registers. However, in my current environment, I don’t have a lot of control over how OpenCV is built. But I do have control over how my own application that uses OpenCV is compiled. Is there a way to tell the universal intrinsics to just give me 256 or 512 bit registers? I played with defining “CV__SIMD_FORCE_WIDTH 512” before including “intrin.hpp” but that is just going to get me into trouble.

Scott

1 Like

Can the dispatching framework be useful for you?

It isn’t clear what problem dispatching was meant to solve. But I’ll give it a try. I built OpenCV4 with CPU_DISPATCH=AVX2 to start. I cannot build it with CPU_BASELINE=AVX2 because Visual Studio 2022 has too many bugs. VS 2022 generates bad code in one instance and gives me an internal compiler bug in another place. (I submitted a bug to Microsoft on the latter).
I built OpenCV4 with CPU_DISPATCH=AVX2. The intrinsics I need to code anything wider than SSE do not exist. The header cv_cpu_dispatch.h has all kinds of interesting macros defined but only if __OPENCV_BUILD is defined, which it is not in my application. So what are the limitations of using the dispatching? Does it work in applications or just inside of OpenCV?