I’d recommend presenting a few more pictures that express the anticipated variation in those pictures.
for that specific picture, you could just hardcode the regions.
for some variation, you could take the min() of each pixel row and column, and analyze that, so you can see where there is only background.
of course, JPEGs are lossy, so that’ll be annoying.