I would go with:
- keypoints detection and descriptors extraction,
- descriptors matching and using the Lowe’s ratio test to reject bad matches,
- homography or affine estimation to further ignore bad keypoints matching.
Note: this is one solution, but they are plenty of options for this kind of tasks.