Weird multiprocessin behaviour

So i had this idea to split very slow stereosgbm on multiple processes by splittin the image. I have tried pool.applyasync and poolexecutor submit and both lead to worse performance.

  • Problem should not be the speed of starting process. If I run one process with fraction of the image at a time it runs fast.
    -When all cores/processes run at same time even the time for one fraction slows down.

I’m wondering if the opencv api underneath blocks somehow? Does all processesses have own instance with opencv?
Is opencv already saturating all cores with some paralel stuff? In taskmanager atleast with single thread it seems to not be able to use full potential. Some registry simd sorcery?