How to speed up fillConvexPoly?

I don’t know anything about OpenMP, but I think it is likely that the whole image will be locked for each thread in turn
You could test the effect of overlapping by trying