Using OpenCV's findHomography to align scanned documents; How do I avoid obviously-bad (non-planar/3D) transforms?

yawnmoth · January 29, 2024, 8:35pm

I’m trying to use OpenCV to line up a scanned in document with the document that we were expecting back and sometimes when scanning in a multi page document and the pages are out of order and the result is a spectacular failure. Like visually I can identify such failures pretty quick but idk how to identify them programmatically.

First, here’s an examples of one of these spectacular failures:

7xucZ

This question was originally posed on stackoverflow a few years ago and the superficially most helpful response was this one:

A normal homography will have the xx/yy coefficients close to 1, and the xy/yx close to 0 (exactly 0 if there is no rotation). The coefficients in the denominator should also be small (0 if no perspective). Observe typical values to detect pathological situations.

Only problem: idk how to do that.

Here’s my code:

import sys
import cv2
import numpy as np

if len(sys.argv) != 4:
  print('USAGE')
  print('  python3 align.py ref.png new.png output.png')
  sys.exit()

FLANN_INDEX_LSH    = 6

def filter_matches(kp1, kp2, matches, ratio = 0.75):
  mkp1, mkp2 = [], []
  for m in matches:
    if len(m) == 2 and m[0].distance < m[1].distance * ratio:
      m = m[0]
      mkp1.append( kp1[m.queryIdx] )
      mkp2.append( kp2[m.trainIdx] )
  p1 = np.float32([kp.pt for kp in mkp1])
  p2 = np.float32([kp.pt for kp in mkp2])
  kp_pairs = zip(mkp1, mkp2)
  return p1, p2, list(kp_pairs)

def alignImages(im1, im2):
  detector = cv2.AKAZE_create()
  flann_params= dict(algorithm = FLANN_INDEX_LSH,
    table_number = 6, # 12
    key_size = 12,     # 20
    multi_probe_level = 1) #2
  matcher = cv2.FlannBasedMatcher(flann_params, {})

  kp1, desc1 = detector.detectAndCompute(im1, None)
  kp2, desc2 = detector.detectAndCompute(im2, None)

  raw_matches = matcher.knnMatch(desc1, trainDescriptors = desc2, k = 2)
  p1, p2, kp_pairs = filter_matches(kp1, kp2, raw_matches)
  if len(p1) < 4:
    print('%d matches found, not enough for homography estimation' % len(p1))
    sys.exit()

  H, matches = cv2.findHomography(p1, p2, cv2.RANSAC, 5.0)
  print(str(len(matches.ravel().tolist())));

  height, width = im2.shape

  imResult = cv2.warpPerspective(im1, H, (width, height))

  return imResult




refFilename = sys.argv[1]
imFilename = sys.argv[2]
outFilename = sys.argv[3]

imRef = cv2.imread(refFilename, cv2.IMREAD_GRAYSCALE)
im = cv2.imread(imFilename, cv2.IMREAD_GRAYSCALE)

imNew = alignImages(im, imRef)

cv2.imwrite(outFilename, imNew)

crackwitz · January 29, 2024, 8:59pm

first: avoid this. present your inputs, and your matching results, and your code that does it. akaze and ratio test should have been robust enough. your images are probably to blame. show them.

then: eigenvectors and eigenvalues help evaluate such a matrix. there’s a few things one can try to evaluate how “ridiculous” the matrix is.

yawnmoth · January 29, 2024, 10:51pm

Using these two images as the first and second parameters of the script I posted will create a “spectacular failure” I described:

Image 1:

Image 2:

Obviously they’re not the same image but the script I posted doesn’t know that.

For my part, I do perceptual hashing on the images before I pass them to that script. I try to guess the order from that. Like if page 1 of what we got back vs page 1 of what we were expecting have the lowest hamming distance of every combination of images then that’s used but what if someone uploaded a 1040 instead of an application for whatever? Like I can’t stop people from uploading the wrong document and even two completely different documents will have a hamming distance. And sure, I could set a minimum threshold, but in some cases, perceptual hashing will fail even when the script I posted will succeed. eg. if you flip an image upside down perceptual hashing is liable to how a quite high hamming distance even tho the transformation is very superficial and the Python script would most likely be able to deal with it.

crackwitz · January 30, 2024, 8:08am

ah, they don’t match at all.

findHomography will give you a mask of inliers. evaluate that. should be very few inliers left. also evaluate how much was left or discarded (absolute or fraction) by the ratio test.

as for the eigenvector thing… I think I have to revise that declaration. it either makes no sense or gets complicated if there’s rotation, and I have no idea how to cover the perspective aspects of the matrix.

there is probably an elegant way to use some other higher math but my brain isn’t on yet.

you can always evaluate the coefficients of the matrix individually (this should be near 0, that should be positive or in some range, those should be about equal, …), or map some corner points and see if they land in any sensible areas.

Steve_in_Denver · January 30, 2024, 4:42pm

I like this idea because it’s straightforward and your test / pass-fail criteria will be evalu ated in a space that is intuitive.

yawnmoth · January 30, 2024, 5:40pm

or map some corner points and see if they land in any sensible areas.

How do I do that?

Steve_in_Denver · January 30, 2024, 8:30pm

Create points for the nominal corners of your document (0,0) (width,0) (width,height) (0,height) and multiply by the homography. See how far they move. The assumption is that you are trying to use the homography to make relatively small corrections to an image that is already pretty close. So a valid homography will transform the points by a small amount. The nominal top left point (0,0) should end up fairly close to 0,0 after the the homography is applied. Same with the other 3.

If you are also handling 90/180/270 degree rotations your test code will have to be a little more involved, but not too bad.

This method should work just fine for detecting spectacular failures. If you’ve got other “close-but-not-quite-correct” failures, this method might not be what you want.

yawnmoth · January 30, 2024, 9:34pm

So something like this?:

def applyHomography(coords, H):
  imagepoint = [coords[0], coords[1], 1]
  worldpoint = np.array(np.dot(H,imagepoint))
  scalar = worldpoint[2]
  xworld = worldpoint[0]/scalar
  yworld = worldpoint[1]/scalar
  return [xworld, yworld]

H, matches = cv2.findHomography(p1, p2, cv2.RANSAC, 5.0)
applyHomography([0, 0], H)

Steve_in_Denver · January 30, 2024, 9:46pm

Yep. Something like that, but you’d probably want to call applyHomography to all 4 corner points.

crackwitz · January 30, 2024, 9:46pm

cv::transform and cv::perspectiveTransform can be used. the second one divides by the “w” dimension.

Topic		Replies	Views
Measuring image similarity with opencv Python calib3d , homography	1	856	August 10, 2023
[opencv.js] support for findHomography() calib3d , opencvjs , features2d , imgproc , core	22	4299	October 22, 2024
Difference between cv.homographyBasedEstimator.apply() and cv.findHomography() Python stitching	1	626	February 18, 2022
Unwanted 180 degree rotation in find homography matrix of chessboard pattern Python calib3d , homography	2	602	January 27, 2022
Locate a b&w shape on a page. rotation and scale invariant Python	1	755	January 12, 2022

Using OpenCV's findHomography to align scanned documents; How do I avoid obviously-bad (non-planar/3D) transforms?

Related topics