The performance of the same code on win is lower than on Mac

I have a C++program that is using opencv v4.5.5. But when running, I found that the same code runs faster on MacBook Pro (MacBook Pro14,1, core i5 2.3GHz) than on Win computer (i5 6500 3.2GHz).

For example, the following code runs for 1ms on Mac and 33ms on win.

for(int j = 0; j < img.rows; j++)
{
    auto arow = img.row(j);

    vector<short> v = (vector<short>)arow;  <<<< 1ms on Mac, but 33ms on Win, why???

	……
}

Does anyone know the reason and how to solve it?

It might be that if you do nothing with your local variable, Mac’s compiler optimizes the whole code out…

However, the opencv actually uses memcpy, and the compiler also has special optimization for this sentence. Is it the “special” parallel computation on the mac?

Can you share your complete code including how you derive your timings?

Are you comparing Release or Debug builds?

I am comparing the performance of Release between Mac and Win.
The following is the my code . The input image size is 2600 * 3400, so row_ ElapsedTime gets a sum after 2600 cycles.


    for(int j = 0; j < img.rows; j++)
    {
        auto arow = img.row(j);
#if Performance
        auto row_STime = std::chrono::high_resolution_clock::now();
#endif
       vector<short> v = (vector<short>)arow;
 
#if Performance
        auto row_ETime = std::chrono::high_resolution_clock::now();
        row_elapsedTime += (double)std::chrono::duration_cast<std::chrono::microseconds>(row_ETime - row_STime).count();
#endif

       ......
    }

It looks like the conversion initialization of the vector is slow for certain data types. On windows 16U, 32F take 20x longer than 8U and I’m guessing this isn’t the case on a mac.