Inaccurate metrology measurements in OpenCV orthomosaic from three 48MP cameras

Hi everyone,

I am building a system to generate a scale-accurate top-down view of a large area using three ceiling-mounted 48MP webcams. While the stitching works visually, the real-world measurements (verified in ImageJ) are significantly off.

The Setup:

  • Cameras: 3x Webcams (8000x6000 px), fixed at the ceiling.

  • Reference: ArUco markers on the floor with known global coordinates (in mm).

  • Goal: A stitched image where 1 pixel represents 1 mm.

The Issue:
I am using pre-calculated Homography matrices (\(H\)) to warp the images onto a common canvas. Although the markers are detected and the math seems correct, the scale in the final output is inconsistent.

Key parts of my Stitching Logic:

I am loading pre-calculated Homography matrices that map pixels directly to millimeter coordinates:

# How I apply the transformation and stitching
def stitch_from_precalibrated(img_paths):
    # px_per_mm is set to 1.0 as world coords are in mm
    px_per_mm = config.get("px_per_mm_ref", 1.0)
    
    # Calculate canvas size based on min/max marker coordinates (in mm)
    all_world_points_mm = np.array([corner for marker in config['markers'].values() for corner in marker[0]], dtype=np.float32)
    x_min_mm, y_min_mm = np.min(all_world_points_mm, axis=0)
    x_max_mm, y_max_mm = np.max(all_world_points_mm, axis=0)
    
    padding_px = 100
    canvas_w = int(np.ceil(x_max_mm - x_min_mm)) + (2 * padding_px)
    canvas_h = int(np.ceil(y_max_mm - y_min_mm)) + (2 * padding_px)

    # Shift origin to fit everything on canvas
    translation_matrix = np.array([[1, 0, -x_min_mm + padding_px], 
                                   [0, 1, -y_min_mm + padding_px], 
                                   [0, 0, 1]], dtype=np.float32)

    for i, img in enumerate(images):
        H = master_calibration_data[i]['H'] # Pre-calculated Homography
        final_transform = translation_matrix @ H
        
        # Warping to the large 1px = 1mm canvas
        warped = cv2.warpPerspective(img, final_transform, (canvas_w, canvas_h))
        # ... (masking and blending logic)

Verwende Code mit Vorsicht.

My Questions:

  1. Lens Distortion: With 48MP resolution, cv2.warpPerspective uses a linear homography. Is it possible that the radial distortion of the lenses (not accounted for before warping) is causing the scale drift towards the image edges?

  2. Order of Operations: Should I first undistort() the raw frames using a camera matrix (\(K\)) and distortion coefficients (\(D\)) before applying the Homography, or can a Homography alone (calculated from markers) theoretically handle 48MP distortion?

  3. Coordinate Precision: At this high resolution, could the floating-point precision of the translation_matrix @ H multiplication lead to measurable errors over a large canvas?

Absolutely! You might be able to skip calibrating for lens distortion, but I wouldn’t count on it. Better calibrate those cameras. calibration has many pitfalls. “good” reprojection error cannot be trusted. low repro error is necessary but not sufficient.

the resolution doesn’t matter to the math. the lens distortion and lack of calibration/compensation for it is the issue here.

a homography, being linear, can never handle lens distortion. it can only handle the 3D plane-to-plane mapping.

since you are doing image stitching, you should look into OpenCV’s stitching pipeline. look at GitHub - OpenStitching/stitching: A Python package for fast and robust Image Stitching · GitHub. someone took the time and effort to understand it all and make it user-friendly. he might be reachable for consultation. I’ve seen him active on Stack Overflow, and perhaps also this forum.

image stitching should (and does) fuse calculations wherever possible, so as to avoid generating intermediate results (repeated resampling is bad, sampling once is best). for prototyping, don’t sweat it, just generate intermediate images. 48 megapixels gives you some margin to tolerate repeated resampling.

I don’t know if there is a way to inject separately performed lens calibations into the stitching pipeline.

if you don’t use the stitching pipeline, you can do all you need “manually”. cv.undistort() should be able to take distortion coefficients and a suitably manipulated camera matrix (using marker-camera pose matrix), and synthesize a top-down view. or do that in two steps, through intermediate image result: undistort first for the lens distortion, then perspective warp for the camera pose.

don’t ask me how the magic inside of cv.getOptimalNewCameraMatrix operates exactly. you’ll probably have to get a good book on the topic like “multi view geometry” (hartley & zisserman?), should be on shelves at any decent library, the nearest university library for sure. you’ll have to get familiar with that function though. if you don’t, the results will “look good” but won’t be usable for metrology. that mostly hinges on how exactly this function calculates its resulting matrix and how it relates to the input matrix.

highly unlikely. machine arithmetic limitations would look qualitatively different from lens distortion effects.

what you see is the lack of a lens distortion model.

your LLM is articulate and sounds natural, but it’s an LLM none-the-less.


possible alternative: if your goal is to measure planks of wood, and if they pass by you on a conveyor belt, you could use a line scan camera instead, combined with some position data from the conveyor belt or in-view position markers, like a particular type of yard stick, binary code or “Gray code”, something that “encodes” position.

since you’re already using high resolution cameras, I’d encourage you to continue that. good results are possible.

eyeballing the picture and the yard stick, I figure you get something like 2000-2500 pixels to a meter, or 0.3-0.5 mm per pixel.

all the error comes from lens distortion, inherently. a lens distortion model can compensate for much of that. the pure geometry of plane-to-plane projection should be “entirely” accountable, to a precision that is much smaller than “one pixel”.

1 Like