Fastest CPU build configuration for simple fucntions?

Hi, I’m looking for solution on how to improve image processing CPU performance of basic operations like cut, copy, split channels, change type and other. I have already removed as much of this operations as I can. For some reason implementing algorithms with C works faster then my Opencv VS19 build, so I think I miss something. I tried to build libs with different CPU_BASELINE optimizations but performance didn’t change compared to prebuild binaries (I have checked that all processors that I use for tests support AVX2, SSE4.1, SSE4.2).
Some of my time measurments for example: I’m resizing image ~8000x300 to ~4000x300 and revert channels order with cvtColor rgb2bgr, this costs me ~4-5ms for resize plus 3ms for cvtColor. Handwritten function by my colleague make both of operations in 2.3ms just iterating raw data with loops.
For now I can’t use CUDA and other GPU optimization. I tried to make everything using UMat but I have to call some functions with CPU implementation so the I/O GPU operations supress all performance advantage.
I assume that OpenCV should be faster or at least not so slow compared to primitive loops function, so I’m looking what I’m missing in build options or other advice how to improve performance.

your own code probably assumes that two neighboring pixels are merged.

opencv’s resize might or might not try to detect that situation. it’s a general function for arbitrary resizing. it’ll probably run the general algorithm, which does more calculation due to its generality.

Thanks for your answer.

As far as I can understand resize uses linear algorithm by default for calculating new pixel values (witch stated to be the fastest one). Looks like possible cause of custom resize acceleration is omitting of vertical pixels blending calculation in my colleague code.

Should performance of resize, threshold, sum and other relatively simple fucntions differ when build with different CPU_BASELINE ? I don’t notice any performance difference with different CPU_BASELINE options, maybe I need to change something else in build configuration? I’m going to try different compilers for now, maybe I would gain performance improvement.

no because that’s only the baseline. there is dynamic dispatch on the actual cpu capabilities (SSE*, AVX, AVX2, AVX512, …)