I’m working on a project using OpenCV, where I apply a filter to an image and measure the time it takes for this process. I then collect the top 1% of the best results among different algorithms. However, I’m facing an issue where the execution times of the sequential and parallel algorithms seem contradictory.
The algorithms are being executed in the color_rgb.simd.hpp file in the cvtBGRtoGray function at line 1233. To measure the time, I used high_resolution_clock.
This is the sequential code:
uchar media;
long last_byte = (width * height) * 3;
for (long i = 0, j = 0; i < last_byte; i += 3, j++) {
media = (src_data[i] + src_data[i+1] + src_data[i+2]) / 3;
dst_data[j] = media;
}
And this is the parallel code:
const ponto *inicio = reinterpret_cast<const ponto*>(src_data);
const ponto *fim = reinterpret_cast<const ponto*>(src_data + (( width * height) * 3));
uchar *destino = dst_data;
std::transform(std::execution::par_unseq, inicio, fim, destino, [](const ponto &p) {
uchar media = (p.r + p.g + p.b) / 3;
return media;
});
I conducted tests with different-sized images. However, here are the results for a 256x256 pixel image:
Sequential Algorithm: 10,388 ns Parallel Algorithm: 10,834 ns
However, here are the results for a 625x615 pixel image:
Sequential Algorithm: 63,999 ns Parallel Algorithm: 62,299 ns
However, here are the results for a 1920x1080 pixel image:
Sequential Algorithm: 323,428 ns Parallel Algorithm: 327,229 ns
However, here are the results for a 7680x4320 pixel image:
Sequential Algorithm: 8,641,884 ns Parallel Algorithm: 8,782,140 ns
However, here are the results for a 15,360x8,640 pixel image:
Sequential Algorithm: 35,218,707 ns Parallel Algorithm: 35,070,783 ns
The test results are surprising because the parallel algorithm takes longer than the sequential one, which seems contradictory. When the situation is reversed, there is no significant optimization. I would like to understand why this is happening and if there is any optimization I can make to improve the performance of the parallel algorithm.
Any ideas about what might be causing this discrepancy in execution times? I appreciate any help or guidance in advance."