Influence of interpolation on measuring with homography

Hello,

I am aiming to measure objects within a plane.
So starting with this:


I am able to transform it to this:

(For testing purposes I used the (arbitrary) quadrilateral with known points.)

So now I am able to measure angles and distances in this plane. (At least the process of measuring is simplified, because I guess one could also measure in the original image with enough effort and maths.)

Back to my topic: The “cv.warpPerspective” function interpolates pixels. (Because the homography maps pixels, which may be considered as integers to real numbers (floats). So e.g. pixel 102, 506 gets “warped” to 300.5 and 736.9. Then one is in need to interpolate the values at 300 and 737 from the surrounding “neighbors”.)

Now my question(s):
(1) Since the picture is getting “strechted” in some regions more than in others I am in fear that this interpolation introduces an measurement error?

(2) Does anyone got experiance how this influences the measuring?

(3) Does it intruduce errors at all?

(4) And which interpolation method would be the most exact/preferred and why?

(5) Which resolution to choose for the final picture?

My guess would be that the maximum error is a one “square” pixel-region. Because thats the “uncertainity” of the warping (123, 234 → 100.6 and 300.2 could land at 100/ 101 and 300/301). Might that be true?

Some comments

  1. If the intention was to warp the image so the printed page was rectangular after warping, the result wasn’t very good. I’m assuming the four squares near the corner of the paper form a rectangle. The angles in the top left/bottom right were 85 deg / 95 deg. I’m sure you know that, but it leads to my next comment.
  2. The perspective warp itself is probably a much larger source of error vs interpolation. I would focus on getting good data for your perspective warp and not worry too much about the effects of interpolation at this point.
  3. Ideally you should be locating your features in sub-pixel units, not whole pixels. I’m not sure your quad vertices would work well with cornerSubpix (try it?) but there are ways to get better than mouse-click resolution to estimate the image locations of the vertices. (Fit lines to the edges and compute their intersection?)
  4. I would try to compute my perspective transform from many more than 4 points if possible. Use some method that discards outliers and computes the transform only using the inliers.
  5. If you can find the features in the warped image and then transform the point locations (vs undistorting the whole image and finding features there), you could avoid any error that might be induced by the image warp. Of course you have to be careful that your feature localization works well (accurately) in the perspective-distorted space.
1 Like

once again, that sheet of paper isn’t lying flat at all.

Thank you for pointing that out, but please consider these pictures just as a “prove of concept”. These won’t be my final images! I am going to use a detection algorithm.

Hello Steve,

thank you very much for your detailed reply.

1: Yes that was my intention. These pictures were just as an example. But I guess I should have chosen better ones. Which software did you use to calculate the angles?

2: Okay so I will focus on the accurate detection of the points of interest and not further dig into interpolation.

3: Is there any feature you could recommend out of your experiance? Best would be one that is stable and already implemented in OpenCV. Maybe 4 aruco markers in the corners?

4: I am planing on using a big charuco board. This gives me many of these desired references.

Your point 5 sounds very interesting, but I wasn’t able to follow all of it. As far as I understand you mean that I should find features, calculate the homography and apply it. Without unditorting the image in the first place?
Is that right, could you explain it, maybe using an example?

  1. I use Gimp for most of my image inspection.

I don’t really understand what your end goal is, so my suggestions are based on assumptions that might not be true. For now I’m assuming you are trying to use a single (calibrated intrinsics) camera to determine the world-plane positions of points in the image. I’ll further assume that the world plane is defined by a calibration target of your choosing.

  1. Procure a Charuco target that is appropriate size and density for your purposes, preferably something very flat.
  2. Use that charuco target* to calibrate the intrinsics and convince yourself that the calibration is accurate. (Project the chessboard corners to the original calibration images using the calibrated intrinsics and the rvec/tvec corresponding to the image.) Don’t proceed until this working well - you should be able to draw circles on your original images of the projected world points (draw the circles with subpixel values) and be happy that all the circles end up where they should.
  3. Take a picture and use the charuco functions to detect the markers and interpolate / estimate where the chessboard corners are. Use cv::cornerSubpix on the predicted locations of the chessboard corners and use those (subpixel) image locations along with the corresponding world (plane) coordinates to call solvePnPRansac (or use the plain SolvePnP and filter outliers).**
  4. Now you have your camera pose in the world-plane coord system so you can compute the world point (constrained to the world plane) for any image point you have. OpenCV might have a way to do this, or you can cook up your own function using some linear algebra. Effectively what you need to compute the ray defined by the image point and the pinhole, and then compute where that ray intersects the world plane. In camera coordinates the pinhole is by definition <0,0,0> and the image coordinate is something like <ix - cx, iy - cy, f>. Note that the units of this point are in pixels, which (most likely) isn’t the same as the units of your world plane, but don’t worry - we are interested in the ray defined by these points, and the scale factor doesn’t matter. In fact it’s customary to write the image coordinate as <(ix-cx)/f, (iy-cy)/f, 1>, or at least that’s what I do. Once you have your two points that define the ray (in camera coordinates) you can compute the corresponding world-plane coordinates using the extrinsics you got from step 3. You’ll probably want to use the Rodrigues function to compute a rotation matrix from your rvec. So now you have a image ray in world-plane coordinates and you can compute where they intersect. If your linear algebra skills are good you probably know how to do this without thinking about it, but if you are like me you can rely on your google skills instead. Compute the intersection of the plane and the ray, and that will give you your world point. Simple, right? :slight_smile:

*You have mentioned that charuco isn’t well supported in Python, so this might be a little harder to pull off. I’d probably try to figure out how to get the charuco calibration working or dig into the implementaiton enought to use the basic functions and write the ones you need. If it’s just the interpolateCharuco function that’s probably (farily) straightforward - all it does is look for pairs of adjacent aruco markers and uses their image locations to predict the image location of a chessboard corner that falls between them. (Well, it also associates the corresponding world point). It’s just and estimate and it is essential to call cvCornerSubpix on that estimate to get a good estimate.

**If you aren’t familiar with RANSAC now is a good time to read at least a little bit.

So I’ve just given you an description of what I would do given your situation as I understand it. It’s not entirely the same as what I was originally suggesting (using a homography) - there’s a reason for that and I’ll get to it in a bit. If you just want to try the homography approach, do step 1-3 except compute a homography (again use RANSAC and a lot of points) instead of the camera extrinsics. Make sure you are using undistorted points when you compute your homography. You can compute two homographies, actually - one that takes plane points as input and gives image points as output and vice versa. Then you can easily compute world plane points from your image points.

So why all the extra work in step 4? Because if you end up wanting to measure objects that don’t actually occupy the world plane the camera extrinsics version is easily adaptable. Instead of intersecting your ray with the original world plane, you intersect it with a world plane translated by the thickness of the object (assuming you know that thickness). Also the fully calibrated (extrinsics) version is closer to what you will need if you want to use two cameras / stereo. Don’t get me wrong, homographies are great in many cases, but sometimes a full calibration is the better approach.

1 Like