analyze the transform. it should not even be perspective, if you use actual scanner hardware, not photos. it should have an expected scale, near 0 rotation, no shearing of note, and the translation should be within some expected range as well.
you can throw OCR at both docs and compare. extracted text is a different kind of feature.