Perfect image stitching?

Ok folks I apologize in advance for the long post, but I am putting everything I have and hope for your input to help a friend.

The problem:

I have a friend who makes/repairs boat decks. The most cumbersome and time consuming part of the manufacturing process is taking a good template of where the decking should be placed. At the moment he uses an old manual approach to do this. Putting down cardboard/foam and trying to cut out the template in shape. It is really time consuming to achieve a good result, especially working on large areas.

The solution:

What I am trying to achieve is to help him take pictures of the area, stitch them together, scale and extract the shape(template) in dxf/svg so he can cut the template on a plotter, or even use the file to machine the part.

Requirements:

Easy to use - Take multiple pictures from different locations, scale, rotation ( phone pictures )
Accuracy - on average 1-1.5 mm accuracy is pretty good.

I cannot upload a lot of pictures here so I created this document to show you the result of my current implementation. Take a look here

Here is the code of my stitching process:

from matplotlib import pyplot as plt
from pathlib import Path
import cv2 as cv
import numpy as np
import torch
import networkx as nx

#https://github.com/OpenStitching/stitching/tree/main/stitching
import stitching.stitcher as st



def plot_image(img, figsize_in_inches=(5,5)):
    _, ax = plt.subplots(figsize=figsize_in_inches)
    ax.imshow(cv.cvtColor(img, cv.COLOR_BGR2RGB))
    plt.show()
    
def plot_images(imgs, figsize_in_inches=(5,5)):
    _, axs = plt.subplots(1, len(imgs), figsize=figsize_in_inches)
    for col, img in enumerate(imgs):
        axs[col].imshow(cv.cvtColor(img, cv.COLOR_BGR2RGB))
    plt.show()

def get_image_paths(img_set):
    return [str(path.relative_to('.')) for path in Path('images').rglob(f'{img_set}*')]


def floyd_warshall_max_confidence(confidence: torch.Tensor):
    assert confidence.shape[-1] == confidence.shape[-2], 'Graph matrix should be square'
    
    n = confidence.shape[-1]
    
    # Initialize the distance matrix with the given confidence values
    dist = confidence.clone()
    
    # Set diagonal elements to the maximum value in the tensor
    max_value = torch.max(confidence)
    dist.diagonal().fill_(max_value)
    
    # Floyd-Warshall algorithm
    for k in range(n):
        for i in range(n):
            for j in range(n):
                # Update the confidence if a higher confidence path is found
                dist[i][j] = torch.max(dist[i][j], torch.min(dist[i][k], dist[k][j]))
    
    return dist

def get_path_confidence(max_confidence_matrix, path):
    confidences = [max_confidence_matrix[path[i]][path[i+1]] for i in range(len(path)-1)]
    return min(confidences), confidences


def find_best_overall_path(max_confidence_matrix):
    n = max_confidence_matrix.shape[0]
    best_path = []
    best_min_confidence = -float('inf')

    for start in range(n):
        path = [start]
        visited = set([start])
        
        while len(visited) < n:
            current = path[-1]
            next_node = -1
            max_confidence = -float('inf')
            
            for i in range(n):
                if i not in visited and max_confidence_matrix[current][i] > max_confidence:
                    max_confidence = max_confidence_matrix[current][i]
                    next_node = i
            
            if next_node == -1 or max_confidence == 0:
                break  # No valid next node found or confidence is zero
            
            path.append(next_node)
            visited.add(next_node)
        
        if len(path) > 1:
            path_min_confidence = min(max_confidence_matrix[path[i]][path[i+1]] for i in range(len(path)-1))
            if path_min_confidence > best_min_confidence:
                best_min_confidence = path_min_confidence
                best_path = path

    return best_path

def visualize_graph(confidence_tensor: torch.Tensor, threshold: float):
    G = nx.Graph()

    num_nodes = confidence_tensor.shape[0]
    for i in range(num_nodes):
        for j in range(num_nodes):
            if i != j and confidence_tensor[i, j] > threshold:
                G.add_edge(i, j, weight=confidence_tensor[i, j].item())

    # Remove nodes without edges
    nodes_to_remove = [node for node in G.nodes if G.degree(node) == 0]
    G.remove_nodes_from(nodes_to_remove)

    # Draw the graph
    pos = nx.spring_layout(G) 
    edges = G.edges(data=True)
    weights = [edge[2]['weight'] for edge in edges]

    nx.draw(G, pos, with_labels=True, node_color='lightblue', node_size=500, font_size=16, font_weight='bold')
    edge_labels = {(i, j): f'{data["weight"]:.2f}' for i, j, data in edges}
    nx.draw_networkx_edge_labels(G, pos, edge_labels=edge_labels)
    nx.draw_networkx_edges(G, pos, width=weights)

    plt.title("Graph Visualization")
    plt.show()



# ============ Load images ============ 
# - mark my images with a prefix to distinguish them from the rest
floor_imgs = get_image_paths('ico2')
print(floor_imgs)
# =====================================

# ========== Resize images ===========
# If the images should not be stitched on full resolution, this can be achieved by setting the `final_megapix` parameter to a number above 0.
# Default values: Images.of(images, medium_megapix=0.6, low_megapix=0.1, final_megapix=-1)

images = st.Images.of(floor_imgs, final_megapix=1)
medium_imgs = list(images.resize(st.Images.Resolution.MEDIUM))
low_imgs = list(images.resize(st.Images.Resolution.LOW))
final_imgs = list(images.resize(st.Images.Resolution.FINAL))
# Plot images
#plot_images(low_imgs, (20,20))

# Get image sizes
original_size = images.sizes[0]
medium_size = images.get_image_size(medium_imgs[0])
low_size = images.get_image_size(low_imgs[0])
final_size = images.get_image_size(final_imgs[0])
print(f"Original Size: {original_size}  -> {'{:,}'.format(np.prod(original_size))} px ~ 1 MP")
print(f"Medium Size:   {medium_size}  -> {'{:,}'.format(np.prod(medium_size))} px ~ 0.6 MP")
print(f"Low Size:      {low_size}   -> {'{:,}'.format(np.prod(low_size))} px ~ 0.1 MP")
print(f"Final Size:    {final_size}  -> {'{:,}'.format(np.prod(final_size))} px ~ 1 MP")
# =====================================

# =========== Find features  ==========
# We use medium resolution images to find features
# FeatureDetector(detector='orb', nfeatures=500) -> ORB detector with 500 features  TODO = Try Xfeat to find features! https://github.com/verlab/accelerated_features
finder = st.FeatureDetector(detector='orb', nfeatures=5000)
features = [finder.detect_features(img) for img in medium_imgs]
# Plot an image by index
#plot_img_keypoints = finder.draw_keypoints(medium_imgs[1], features[1])  
#plot_image(plot_img_keypoints, (15,10))
# =====================================

# =========== Match features ===========
# Default values: FeatureMatcher(matcher_type='homography', range_width=-1) -> Homography matcher with no range width
matcher = st.FeatureMatcher()
matches = matcher.match_features(features)
# =====================================


# ======== Find best images to combine =====
# Calculate the confidence matrix
threshold = 2.3  # Filter low confidence matches
confidence_tensor = torch.tensor(matcher.get_confidence_matrix(matches))
confidence_tensor = torch.where(confidence_tensor < threshold, torch.tensor(0.0), confidence_tensor)

# Use Floyd-Warshall to find the maximum confidence path
max_confidence = floyd_warshall_max_confidence(confidence_tensor)

# Visualize the graph
visualize_graph(max_confidence, threshold)

# Find the best overall path (from image to image)
best_path = find_best_overall_path(max_confidence)
print(f"Best Path: {best_path}")

if best_path:
    min_confidence, path_confidences = get_path_confidence(max_confidence, best_path)
    print(f"Best overall image sequence: {best_path}")
    print(f"Minimum match confidence along the sequence: {min_confidence:.4f}")
    print("Match confidences between consecutive images:")
    for i in range(len(best_path) - 1):
        print(f"  Image {best_path[i]} -> Image {best_path[i+1]}: {path_confidences[i]:.4f}")
else:
    print("No valid path found that includes all images.")


# Subset all the variables
print(f"Indices: {best_path}")

subsetter = st.Subsetter()
medium_imgs = subsetter.subset_list(medium_imgs, best_path)
low_imgs = subsetter.subset_list(low_imgs, best_path)
final_imgs = subsetter.subset_list(final_imgs, best_path)
features = subsetter.subset_list(features, best_path)
matches = subsetter.subset_matches(matches, best_path)
images.subset(best_path)
# =====================================

# =========== Camera Estimation, Adjusting and Correction ============
# Default values:
# CameraEstimator(estimator='homography')
# CameraAdjuster(adjuster='ray', refinement_mask='xxxxx')
# WaveCorrector(wave_correct_kind='horiz')
camera_estimator = st.CameraEstimator()
camera_adjuster = st.CameraAdjuster()
wave_corrector = st.WaveCorrector(wave_correct_kind='no')

cameras = camera_estimator.estimate(features, matches)
cameras = camera_adjuster.adjust(features, matches, cameras)
cameras = wave_corrector.correct(cameras)
# ====================================================================

# =========== Warp Images ============
# Default values: Warper(warper_type='spherical', scale=1)
warper = st.Warper(warper_type='plane')
warper.set_scale(cameras)

# Warp LOW resolution images
low_sizes = images.get_scaled_img_sizes(st.Images.Resolution.LOW)
camera_aspect = images.get_ratio(st.Images.Resolution.MEDIUM, st.Images.Resolution.LOW)  # since cameras were obtained on medium imgs
warped_low_imgs = list(warper.warp_images(low_imgs, cameras, camera_aspect))
warped_low_masks = list(warper.create_and_warp_masks(low_sizes, cameras, camera_aspect))
low_corners, low_sizes = warper.warp_rois(low_sizes, cameras, camera_aspect)
# Plot warped images
#plot_images(warped_low_imgs, (10,10))
#plot_images(warped_low_masks, (10,10))

# Warp FINAL resolution images
final_sizes = images.get_scaled_img_sizes(st.Images.Resolution.FINAL)
camera_aspect = images.get_ratio(st.Images.Resolution.MEDIUM, st.Images.Resolution.FINAL)
warped_final_imgs = list(warper.warp_images(final_imgs, cameras, camera_aspect))
warped_final_masks = list(warper.create_and_warp_masks(final_sizes, cameras, camera_aspect))
final_corners, final_sizes = warper.warp_rois(final_sizes, cameras, camera_aspect)

# Timelapser DEBUG
# The Timelapser functionality is a nice way to grasp how ti images are wrapped and placed in the final plane
# Default values: Timelapser(timelapse='no')
timelapser = st.Timelapser('as_is')
timelapser.initialize(final_corners, final_sizes)

for img, corner in zip(warped_final_imgs, final_corners):
    timelapser.process_frame(img, corner)
    frame = timelapser.get_frame()
    #plot_image(frame, (10,10))

# Show the combined mask to see how the images are placed
cropper = st.Cropper()
mask = cropper.estimate_panorama_mask(warped_low_imgs, warped_low_masks, low_corners, low_sizes)
plot_image(mask, (5,5))
# ==========================================


# =========== Stitch Images =================
# Seam Masks
# The seams are obtained on the warped low resolution images and then resized to the wraped final resolution images. 
# Default values: SeamFinder(finder='dp_color')
seam_finder = st.SeamFinder(finder="dp_color")
seam_masks = seam_finder.find(warped_low_imgs, low_corners, warped_low_masks)
seam_masks = [seam_finder.resize(seam_mask, mask) for seam_mask, mask in zip(seam_masks, warped_final_masks)]

seam_masks_plots = [st.SeamFinder.draw_seam_mask(img, seam_mask) for img, seam_mask in zip(warped_final_imgs, seam_masks)]
#plot_images(seam_masks_plots, (15,10))

# Exposure Compensator
# Default values: ExposureErrorCompensator(compensator='gain_blocks', nr_feeds=1, block_size=32)
# The Exposure Error are estimated on the warped low resolution images and then applied on the warped final resolution images.
compensator = st.ExposureErrorCompensator(compensator='channel', nr_feeds=1, block_size=32)

compensator.feed(low_corners, warped_low_imgs, warped_low_masks)

compensated_imgs = [compensator.apply(idx, corner, img, mask) 
                    for idx, (img, mask, corner) 
                    in enumerate(zip(warped_final_imgs, warped_final_masks, final_corners))]

# Blending
# Default values: Blender(blender_type='multiband', blend_strength=5)
# The blend strength thereby specifies on which overlap the images should be overlayed along the transitions of the masks.

blender = st.Blender(blender_type='multiband', blend_strength=10)
blender.prepare(final_corners, final_sizes)
for img, mask, corner in zip(compensated_imgs, seam_masks, final_corners):
    blender.feed(img, mask, corner)
panorama, _ = blender.blend()

plot_image(panorama, (15,15))
# Save the panorama
output_filename = './outputs/panorama.jpg'
cv.imwrite(output_filename, panorama)

# Output the seam masks DEBUG
blended_seam_masks = seam_finder.blend_seam_masks(seam_masks, final_corners, final_sizes)
#plot_image(blended_seam_masks, (5,5))
plot_image(seam_finder.draw_seam_lines(panorama, blended_seam_masks, linesize=2), (15,10))
#plot_image(seam_finder.draw_seam_polygons(panorama, blended_seam_masks), (15,10))

forget stitching.

use one camera. make sure it’s WITHOUT any image stabilization. pick a fixed zoom and focus. calibrate it well. good reprojection error is necessary but not sufficient.

make sure the arucos are flat, not just flappy sheets of paper. one aruco in the picture is enough. prepare a series of them in various sizes so he can choose the largest to fit in the picture.

maybe get a tripod for the camera and take two pictures, one for the outline, one for the aruco, so it doesn’t have to be entirely within the shape. it still has to be in the same plane as the shape though.

Hey @crackwitz and thanks for your reply.

I don’t think I can really escape from the stitching. Your suggested approach could only work for very small areas. I just tried it at home, to draw a nice big shape and take one single picture. Even at home it was hard to be straight on top to avoid having to fix the perspective distortion(just for the test).

Anyhow if a single shot is possible yes, I agree there is no need of stitching, but most of the time those shapes are big…The idea thou to reduce the number of pictures instead of increasing it, to avoid those stitching issues is good. I will keep that in mind - because my usual mindset was - more pictures more data and better result…but the issue still stands, because even with 3-4 photos from a holding phone and moving very slowly, the output panorama still have some miss-alignments.

Is this because I take the pictures not by rotating the camera as usually, but moving along and even subtle movements create distortion in the perspective which the homography cannot handle(parallax errors)? Please excuse me if I talk nonsense I am still new to all this.

I now understand that there is a big difference between the standard panoramic approach ( what I am using ) compared to :

Linear panorama - As far as I understand this should always be taken from the same distance ( in my case floor ) which I could not achieve with a free holding device.

Multi-view panorama - This old presentation looks very interesting but I could not find anything else except this paper.

I am considering trying out Hugin with their mosaic mode, just to see if works good enough. I see people stitching drone photos ( which is the closes thing to my use case that I can think of ).
Here is a short explanation of the Hugin underlying implementation of photo-mosaic.

Is it possible to achieve something like this panotools photo-mosaic with OpenCV?

I’m quite impressed with the results you are getting with that approach. Good work so far! I don’t really know much about stitching, but I would expect that you will get the best results if you use a calibrated camera with locked-down focus and zoom so you can undistort the images and feed high-quality undistorted images to the stitching algorithm, otherwise I fear that your stitching results won’t be as accurate as you need them to be.

Having said that, I’m on Team Crackwitz - skip the stitching if at all possible.

If I were attempting this, I’d probably do something like:

  1. Select a camera / lens that can capture the whole area in a single shot.
  2. Lock down the lens (focus, zoom) and calibrate the camera intrisics.
  3. Mount the camera, take a picture of the area you care about, along with an appropriately sized calibration target laying flat on top of the area you care about.
  4. Calibrate the extrinsics based on the calibration target, or alternately compute a homography from the calibration target to image plane.*
  5. Do whatever you are doing to extract the outline from the image, and then convert (using the camera extrinsics+intrinsics, or the homography) to world-plane coords.

*If you are computing a homorgraphy, make sure to use undistorted image points when computing the homography and when converting the path points.

1-1.5mm might be hard to achieve on a large area with a single image, but I think stitching will introduce even more error.

Some thoughts:
Forget about trying to avoid perspective distortion in your images. You will have to correct for it to achieve the kind of accuracy you are going for. (Hence my suggestion of homography or extrinsic calibration above)

I don’t think a single Aruco will be good enough for calibration (even if just scale, but particularly not to recover camera extrinsics/pose). Also note that unless your camera is perfectly perpendicular to the world plane, there isn’t a single scale value (cm/pixel) for the whole image. I think you understand this already, but I’m mentioning it because getting an image without any perspective distortion is harder than just calibrating the extrinsics and accounting for the perspective distortion.

How big are the areas you want to be able to measure? Can you use a high res camera - DSLR or similar?

Good luck!