ok, let’s ignore the issue of hazards and only look at slowdowns.
if multiple cores access the same “cache lines”, they’ll fight over it, which costs synchronization (cache coherence). a typical cache line is large enough to span a few pixels.
and that’s the absolute minimum of issues you’ll face.
as mentioned, if these threads happen to lock the whole picture for access, you’re back to serial execution.