Measuring the similarity percentage between two images

if it’s successive frames of a camera view, then the view changes gradually.

you could want to find an alignment (affine or perspective transform) and then, based on how the quads encompassing the frames overlap, evaluate intersection over union, or other measures of area.

for the alignment, you could use optical flow (dense or sparse), or feature matching. I’d recommend optical flow. low resolution frames should be good enough for your estimation. an optical flow field is quite “rich” for such a simple alignment but it’s at least giving you enough vectors and fairly robust ones at that. sparse optical flow sounds like less work. it may be faster, but need not be.