Build opencv with avx2 avx512

i want to rebuild opencv source code on my machine with support avx2 or avx512, i first compile opencv with blow command:

cmake -DBUILD_PERF_TESTS:BOOL=OFF -DBUILD_TESTS:BOOL=OFF -DBUILD_DOCS:BOOL=OFF  -DWITH_CUDA:BOOL=OFF -DBUILD_EXAMPLES:BOOL=OFF -DINSTALL_CREATE_DISTRIB=ON -DWITH_OPENMP=ON   -DBUILD_opencv_world=ON .. 

however, when i run blow code, i got blow result

void PrintOpenCVInfo()
{
   std::cout << "--------------------------OpenCV informaintion--------------------------"
             << std::endl;
   std::cout << "OpenCV version:" << cv::getVersionString() << std::endl;
   std::cout << "Simd info: " << std::endl;
#ifdef CV_SIMD
   std::cout << "CV_SIMD : " << CVAUX_STR(CV_SIMD) << std::endl;
   std::cout << "CV_SIMD_WIDTH : " << CVAUX_STR(CV_SIMD_WIDTH) << std::endl;
   std::cout << "CV_SIMD128 : " << CVAUX_STR(CV_SIMD128) << std::endl;
   std::cout << "CV_SIMD256: " << CVAUX_STR(CV_SIMD256) << std::endl;
   std::cout << "CV_SIMD512 : " CVAUX_STR(CV_SIMD512) << std::endl;
#else
   std::cout << "CV_SIMD is NOT defined." << std::endl;
#endif

#ifdef CV_SIMD
   std::cout << "sizeof(v_uint8) = " << sizeof(cv::v_uint8) << std::endl;
   std::cout << "sizeof(v_int32) = " << sizeof(cv::v_int32) << std::endl;
   std::cout << "sizeof(v_float32) = " << sizeof(cv::v_float32) << std::endl;
#endif
}

result:

--------------------------OpenCV informaintion--------------------------
OpenCV version:4.6.0
Simd info:
CV_SIMD : 1
CV_SIMD_WIDTH : 16
CV_SIMD128 : 1
CV_SIMD256: 0
CV_SIMD512 : 0
sizeof(v_uint8) = 16
sizeof(v_int32) = 16
sizeof(v_float32) = 16

my question is what should i do to make opencv support avx2 or avx512?

If you are on Linux can you do:

  • lscpu
  • or cat /proc/cpuinfo
    to confirm that your CPU has these instruction sets?

Can you build and run this code to show all the information about your OpenCV build:

std::cout << cv::getBuildInformation().c_str() << std::endl;

Or build and run this sample with verbose and hw options: opencv/opencv_version.cpp at 281b79061867df990573ecd5269cbba5efc54777 · opencv/opencv · GitHub

thanks for reply, i have write the information blow(my platform is windows)

here is the build information

General configuration for OpenCV 4.6.0 =====================================
  Version control:               unknown

  Platform:
    Timestamp:                   2022-12-05T09:07:52Z
    Host:                        Windows 10.0.19044 AMD64
    CMake:                       3.25.1
    CMake generator:             Visual Studio 16 2019
    CMake build tool:            C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/MSBuild/Current/Bin/MSBuild.exe
    MSVC:                        1929
    Configuration:               Debug Release

  CPU/HW features:
    Baseline:                    SSE SSE2 SSE3
      requested:                 SSE3
    Dispatched code generation:  SSE4_1 SSE4_2 FP16 AVX AVX2 AVX512_SKX
      requested:                 SSE4_1 SSE4_2 AVX FP16 AVX2 AVX512_SKX
      SSE4_1 (16 files):         + SSSE3 SSE4_1
      SSE4_2 (1 files):          + SSSE3 SSE4_1 POPCNT SSE4_2
      FP16 (0 files):            + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 AVX
      AVX (4 files):             + SSSE3 SSE4_1 POPCNT SSE4_2 AVX
      AVX2 (31 files):           + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2
      AVX512_SKX (5 files):      + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2 AVX_512F AVX512_COMMON AVX512_SKX

  C/C++:
    Built as dynamic libs?:      YES
    C++ standard:                11
    C++ Compiler:                C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64/cl.exe  (ver 19.29.30147.0)       
    C++ flags (Release):         /DWIN32 /D_WINDOWS /W4 /GR  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi  /fp:precise     /EHa /wd4127 /wd4251 /wd4324 /wd4275 /wd4512 /wd4589 /MP  /MD /O2 /Ob2 /DNDEBUG
    C++ flags (Debug):           /DWIN32 /D_WINDOWS /W4 /GR  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi  /fp:precise     /EHa /wd4127 /wd4251 /wd4324 /wd4275 /wd4512 /wd4589 /MP  /MDd /Zi /Ob0 /Od /RTC1
    C Compiler:                  C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64/cl.exe
    C flags (Release):           /DWIN32 /D_WINDOWS /W3  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi  /fp:precise  
   /MP   /MD /O2 /Ob2 /DNDEBUG
    C flags (Debug):             /DWIN32 /D_WINDOWS /W3  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi  /fp:precise  
   /MP /MDd /Zi /Ob0 /Od /RTC1
    Linker flags (Release):      /machine:x64  /INCREMENTAL:NO
    Linker flags (Debug):        /machine:x64  /debug /INCREMENTAL
    ccache:                      NO
    Precompiled headers:         NO
    Extra dependencies:
    3rdparty dependencies:

  OpenCV modules:
    To be built:                 calib3d core dnn features2d flann gapi highgui imgcodecs imgproc ml objdetect photo stitching video videoio world
    Disabled:                    -
    Disabled by dependency:      -
    Unavailable:                 java python2 python3 ts
    Applications:                apps
    Documentation:               NO
    Non-free algorithms:         NO

  Windows RT support:            NO

  GUI:
    Win32 UI:                    YES
    VTK support:                 NO

  Media I/O:
    ZLib:                        build (ver 1.2.12)
    JPEG:                        build-libjpeg-turbo (ver 2.1.2-62)
    WEBP:                        build (ver encoder: 0x020f)
    PNG:                         build (ver 1.6.37)
    TIFF:                        build (ver 42 - 4.2.0)
    JPEG 2000:                   build (ver 2.4.0)
    OpenEXR:                     build (ver 2.3.0)
    HDR:                         YES
    SUNRASTER:                   YES
    PXM:                         YES
    PFM:                         YES

  Video I/O:
    DC1394:                      NO
    FFMPEG:                      YES (prebuilt binaries)
      avcodec:                   YES (58.134.100)
      avformat:                  YES (58.76.100)
      avutil:                    YES (56.70.100)
      swscale:                   YES (5.9.100)
      avresample:                YES (4.0.0)
    GStreamer:                   NO
    DirectShow:                  YES
    Media Foundation:            YES
      DXVA:                      YES

  Parallel framework:            Concurrency

  Trace:                         YES (with Intel ITT)

  Other third-party libraries:
    Intel IPP:                   2020.0.0 Gold [2020.0.0]
           at:                   E:/github_repos/opencv-4.6.0/build-concurrency/3rdparty/ippicv/ippicv_win/icv
    Intel IPP IW:                sources (2020.0.0)
              at:                E:/github_repos/opencv-4.6.0/build-concurrency/3rdparty/ippicv/ippicv_win/iw
    Lapack:                      NO
    Eigen:                       NO
    Custom HAL:                  NO
    Protobuf:                    build (3.19.1)

  OpenCL:                        YES (NVD3D11)
    Include path:                E:/github_repos/opencv-4.6.0/3rdparty/include/opencl/1.2
    Link libraries:              Dynamic load

  Python (for build):            C:/Users/niu/AppData/Local/Microsoft/WindowsApps/python3.exe

  Java:
    ant:                         NO
    JNI:                         NO
    Java wrappers:               NO
    Java tests:                  NO

  Install to:                    E:/github_repos/opencv-4.6.0/build-concurrency/install
-----------------------------------------------------------------

and here is the output of PrintOpenCVInfo

OpenCV version:4.6.0
Simd info:
CV_SIMD : 1
CV_SIMD_WIDTH : 16
CV_SIMD128 : 1
CV_SIMD256: 0
CV_SIMD512 : 0
sizeof(v_uint8) = 16
sizeof(v_int32) = 16
sizeof(v_float32) = 16
v_float32: 4
v_int32: 4
v_int8: 16

my platform is windows, here is the cpu-z res:

image

Can you try to build and run this sample:

It includes these two headers:

#include "opencv2/core.hpp"
#include "opencv2/core/simd_intrinsics.hpp"

okok, i will try it 。

the res is

CV_SIMD is defined: 1
CV_SIMD_WIDTH is defined: 16
CV_SIMD128 is defined: 1
CV_SIMD256 is defined: 0
CV_SIMD512 is defined: 0
CV_SIMD_64F is defined: 1
CV_SIMD_FP16 is defined: 0
=================  sizeof checks  =================
sizeof(v_uint8) = 16
sizeof(v_int32) = 16
sizeof(v_float32) = 16
==================  arithm check  =================
(vx_setall_u8(10) + vx_setall_u8(45)).get0() => 55
=====================  done  ======================

First of all, CPU optimization is not my field.


OpenCV uses CPU dynamic dispatching: CPU optimizations build options · opencv/opencv Wiki · GitHub

For instance, if you configure and build OpenCV like this (should be the config by default on a modern CPU):

  CPU/HW features:
    Baseline:                    SSE SSE2 SSE3
      requested:                 SSE3
    Dispatched code generation:  SSE4_1 SSE4_2 FP16 AVX AVX2 AVX512_SKX
      requested:                 SSE4_1 SSE4_2 AVX FP16 AVX2 AVX512_SKX

Running example_cpp_simd_basic.exe should give you something like this:

==================  macro dump  ===================
CV_SIMD is defined: 1
CV_SIMD_WIDTH is defined: 16
CV_SIMD128 is defined: 1
CV_SIMD256 is defined: 0
CV_SIMD512 is defined: 0
CV_SIMD_64F is defined: 1
CV_SIMD_FP16 is defined: 0
=================  sizeof checks  =================
sizeof(v_uint8) = 16
sizeof(v_int32) = 16
sizeof(v_float32) = 16
==================  arithm check  =================
(vx_setall_u8(10) + vx_setall_u8(45)).get0() => 55

That is, baseline is SSE3 and dispatched code is like what is written above.

Dispatched code means:

  • CPU features detection at runtime, so if your CPU does not have AVX2, this code path will not be taken,
  • but bigger binary size for the OpenCV library

This way, you can easily distribute your program with OpenCV on different CPU versions without fearing failure.

In this baseline file, CV_SIMD256 is not available. But, if you use:

you will be able to use the different CPU instruction sets.


If you want to test, configure OpenCV by requesting another set of CPU feature:

CPU/HW features:
    Baseline:                    SSE SSE2 SSE3 SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2
      requested:                 AVX2
    Dispatched code generation:  AVX512_SKX

This time, running example_cpp_simd_basic.exe should give you something like this:

==================  macro dump  ===================
CV_SIMD is defined: 1
CV_SIMD_WIDTH is defined: 32
CV_SIMD128 is defined: 1
CV_SIMD256 is defined: 1
CV_SIMD512 is defined: 0
CV_SIMD_64F is defined: 1
CV_SIMD_FP16 is defined: 0
=================  sizeof checks  =================
sizeof(v_uint8) = 32
sizeof(v_int32) = 32
sizeof(v_float32) = 32
==================  arithm check  =================
(vx_setall_u8(10) + vx_setall_u8(45)).get0() => 55
=====================  done  ======================

thank you so much,i will check the link you posted.

according to your reply, i get such conclusion:

  1. build opencv by defalut config, the basline is SSE3,if my cpu does not support SSE3,my executable will not run,the Dispatched include AVX and AVX2 , if i build opencv with AVX2 in config, the baseline is AVX2, and it wiil run if my cpu supportvAVX2.
  2. if i build opencv by defalut config , and my CPU features support AVX and AVX2, they will run this code path at run time.
    Am i correct?
  1. Yes. Also, a CPU having AVX2 will have AVX, SSE3, etc… See also the wiki page:

Minimal is required set of processor features. Executable will not run if some of these options are not available on target processor.

  1. Yes, and the executable will be bigger since it has to have code with SSE3 optimization, AVX, … See:

Dispatched optimizations are additional code paths compiled into executable. They will be executed on supported processors only.

And:

By default, OpenCV on x86_64 uses SSE3 as basic instruction set and enables dispatched optimizations for SSE4.2, AVX, AVX2 instruction sets. This configuration provides the best effort on wide range of users platforms.

got it, thanks a lot.

According to your build configuration, your opencv library has been built with avx2 and avx512 support. Those optimized functions are runtime dispatched.

But for your own code, i.e. your source files and makefiles, you have to add compiler options (-mavx2 for g++, or /arch:AVX2 for msvc) for the wanted instruction sets to make those SIMD predefined macros to work. This is not necessary unless you want to write your own SIMD code.