Self-built paralle code is way slow then the sample code

I have opencv 4.2 and built the tutorial code: OpenCV parallel_for_ .

Then I compare the time cost with sample code, the sample code output:

Parallel Mandelbrot: 1.65217 s
Sequential Mandelbrot: 11.2684 s
Speed-up: 6.82038 X

However, my output is:

Parallel Mandelbrot: 32.9983 s
Sequential Mandelbrot: 174.418 s
Speed-up: 5.28567 X

It just super wired, my CMakeLists.txt is here:

cmake_minimum_required(VERSION 3.16.3)


## the output location for bin

SET(OpenCV_DIR /home/lin/develop/3rd/opencv/install/opencv-4.2.0-test1/lib/cmake/opencv4/)
find_package(OpenCV 4.2.0 REQUIRED)
message(STATUS "OpenCV library status:")
message(STATUS "    config: ${OpenCV_DIR}")
message(STATUS "    version: ${OpenCV_VERSION}")
message(STATUS "    libraries: ${OpenCV_LIBS}")
message(STATUS "    include path: ${OpenCV_INCLUDE_DIRS}")

add_executable(opencv_tutorial_parallel src/tutorial_parallel.cpp)
target_link_libraries(opencv_tutorial_parallel ${OpenCV_LIBS})

Any suggestion will be great! Thanks all!

speedup is similar. parallelization seems to have worked.

your only issue is optimizations (not) being applied by the compiler.

look at the cmake thingy of the example and compare to yours.

set(CMAKE_CXX_FLAGS “${CMAKE_CXX_FLAGS} -O3 -fsee -fomit-frame-pointer -fno-signed-zeros -fno-math-errno -funroll-loops”)

add this line solved problem, thanks for suggestions