Using OpenCV's findHomography to align scanned documents; How do I avoid obviously-bad (non-planar/3D) transforms?

I’m trying to use OpenCV to line up a scanned in document with the document that we were expecting back and sometimes when scanning in a multi page document and the pages are out of order and the result is a spectacular failure. Like visually I can identify such failures pretty quick but idk how to identify them programmatically.

First, here’s an examples of one of these spectacular failures:

7xucZ

This question was originally posed on stackoverflow a few years ago and the superficially most helpful response was this one:

A normal homography will have the xx/yy coefficients close to 1, and the xy/yx close to 0 (exactly 0 if there is no rotation). The coefficients in the denominator should also be small (0 if no perspective). Observe typical values to detect pathological situations.

Only problem: idk how to do that.

Here’s my code:

import sys
import cv2
import numpy as np

if len(sys.argv) != 4:
  print('USAGE')
  print('  python3 align.py ref.png new.png output.png')
  sys.exit()

FLANN_INDEX_LSH    = 6

def filter_matches(kp1, kp2, matches, ratio = 0.75):
  mkp1, mkp2 = [], []
  for m in matches:
    if len(m) == 2 and m[0].distance < m[1].distance * ratio:
      m = m[0]
      mkp1.append( kp1[m.queryIdx] )
      mkp2.append( kp2[m.trainIdx] )
  p1 = np.float32([kp.pt for kp in mkp1])
  p2 = np.float32([kp.pt for kp in mkp2])
  kp_pairs = zip(mkp1, mkp2)
  return p1, p2, list(kp_pairs)

def alignImages(im1, im2):
  detector = cv2.AKAZE_create()
  flann_params= dict(algorithm = FLANN_INDEX_LSH,
    table_number = 6, # 12
    key_size = 12,     # 20
    multi_probe_level = 1) #2
  matcher = cv2.FlannBasedMatcher(flann_params, {})

  kp1, desc1 = detector.detectAndCompute(im1, None)
  kp2, desc2 = detector.detectAndCompute(im2, None)

  raw_matches = matcher.knnMatch(desc1, trainDescriptors = desc2, k = 2)
  p1, p2, kp_pairs = filter_matches(kp1, kp2, raw_matches)
  if len(p1) < 4:
    print('%d matches found, not enough for homography estimation' % len(p1))
    sys.exit()

  H, matches = cv2.findHomography(p1, p2, cv2.RANSAC, 5.0)
  print(str(len(matches.ravel().tolist())));

  height, width = im2.shape

  imResult = cv2.warpPerspective(im1, H, (width, height))

  return imResult




refFilename = sys.argv[1]
imFilename = sys.argv[2]
outFilename = sys.argv[3]

imRef = cv2.imread(refFilename, cv2.IMREAD_GRAYSCALE)
im = cv2.imread(imFilename, cv2.IMREAD_GRAYSCALE)

imNew = alignImages(im, imRef)

cv2.imwrite(outFilename, imNew)

first: avoid this. present your inputs, and your matching results, and your code that does it. akaze and ratio test should have been robust enough. your images are probably to blame. show them.

then: eigenvectors and eigenvalues help evaluate such a matrix. there’s a few things one can try to evaluate how “ridiculous” the matrix is.

related:

Using these two images as the first and second parameters of the script I posted will create a “spectacular failure” I described:

Image 1:


Image 2:

Obviously they’re not the same image but the script I posted doesn’t know that.

For my part, I do perceptual hashing on the images before I pass them to that script. I try to guess the order from that. Like if page 1 of what we got back vs page 1 of what we were expecting have the lowest hamming distance of every combination of images then that’s used but what if someone uploaded a 1040 instead of an application for whatever? Like I can’t stop people from uploading the wrong document and even two completely different documents will have a hamming distance. And sure, I could set a minimum threshold, but in some cases, perceptual hashing will fail even when the script I posted will succeed. eg. if you flip an image upside down perceptual hashing is liable to how a quite high hamming distance even tho the transformation is very superficial and the Python script would most likely be able to deal with it.

ah, they don’t match at all.

findHomography will give you a mask of inliers. evaluate that. should be very few inliers left. also evaluate how much was left or discarded (absolute or fraction) by the ratio test.

as for the eigenvector thing… I think I have to revise that declaration. it either makes no sense or gets complicated if there’s rotation, and I have no idea how to cover the perspective aspects of the matrix.

there is probably an elegant way to use some other higher math but my brain isn’t on yet.

you can always evaluate the coefficients of the matrix individually (this should be near 0, that should be positive or in some range, those should be about equal, …), or map some corner points and see if they land in any sensible areas.

I like this idea because it’s straightforward and your test / pass-fail criteria will be evalu ated in a space that is intuitive.

or map some corner points and see if they land in any sensible areas.

How do I do that?

Create points for the nominal corners of your document (0,0) (width,0) (width,height) (0,height) and multiply by the homography. See how far they move. The assumption is that you are trying to use the homography to make relatively small corrections to an image that is already pretty close. So a valid homography will transform the points by a small amount. The nominal top left point (0,0) should end up fairly close to 0,0 after the the homography is applied. Same with the other 3.

If you are also handling 90/180/270 degree rotations your test code will have to be a little more involved, but not too bad.

This method should work just fine for detecting spectacular failures. If you’ve got other “close-but-not-quite-correct” failures, this method might not be what you want.

So something like this?:

def applyHomography(coords, H):
  imagepoint = [coords[0], coords[1], 1]
  worldpoint = np.array(np.dot(H,imagepoint))
  scalar = worldpoint[2]
  xworld = worldpoint[0]/scalar
  yworld = worldpoint[1]/scalar
  return [xworld, yworld]

H, matches = cv2.findHomography(p1, p2, cv2.RANSAC, 5.0)
applyHomography([0, 0], H)

Yep. Something like that, but you’d probably want to call applyHomography to all 4 corner points.

cv::transform and cv::perspectiveTransform can be used. the second one divides by the “w” dimension.

1 Like