I am drawing a couple of convex polygons (that overlap one another - this could be important) and I wanted to draw them in parallel to speed things up. Here is the code:
//here I fill other faces
//and finally I draw them
for (int i = 0; i < faces.size(); i++)
fillConvexPoly(img, faces[i], CV_RGB(255, 0, 0));
Now, I thought that adding the following line just before the drawing loop would speed up the process, alas, it makes it actually slower!
#pragma omp parallel for
The OMP is working in general, I set the number of threads to 12. Could it be the problem of accessing the img data, as the polygons overlap one another? Or am I making some basic mistake? How do I speed it up?
I don’t know anything about OpenMP, but I think it is likely that the whole image will be locked for each thread in turn
You could test the effect of overlapping by trying
that is the problem.
what result do you expect when multiple draw calls work on the same data concurrently? which thread wins when multiple write to the same byte/word/cache line?
I hate to say it but this is fundamental stuff in parallel programming. you’ll have to find a book or course or tutorial that covers these aspects.
perhaps some computer graphics introduction would be in order too. OpenGL/Vulkan/Direct3D if it has to be a specific API, but they share the principles.
Yes, I am aware of the concurrent access problem, however I am not sure if it makes any difference if the polygons DO NOT overlap (I have some that do and some that don’t).
ok, let’s ignore the issue of hazards and only look at slowdowns.
if multiple cores access the same “cache lines”, they’ll fight over it, which costs synchronization (cache coherence). a typical cache line is large enough to span a few pixels.
and that’s the absolute minimum of issues you’ll face.
as mentioned, if these threads happen to lock the whole picture for access, you’re back to serial execution.
If you are drawing only a couple (or may be a few more) polygons and simultaneous access by threads is causing the problem may be you could just create (memory and image size permitting) an image for each thread, have each thread write to its specific image and once all threads are done, OR all images? A brute force method, but…