Calculation of IoU between 2 frames when the shot height changes

Hi (continuation of a previous question here), I want to extract images from a drone video and I want to take images that have an implied difference between them, for this purpose I have a function to calculate IoU between 2 images. When the drone does not move in height between the 2 images, I get a result that seems logical, but when the drone rises/falls between the 2 images, I get an illogical result because the scale changes (for example, I get that the iou is above 90 even though they are really different because the height of the photo has changed). What do I need to change/add in order to deal with these cases or do I need to use a different method for this calculation?
Thanks for the help!

(the previous question - Measuring the similarity percentage between two images)

image example -


my code -

def calculate_iou(image1, image2, ratio_threshold=0.75, min_good_matches=10, min_inliers=10):
    gray1 = cv2.cvtColor(image1, cv2.COLOR_BGR2GRAY)
    gray2 = cv2.cvtColor(image2, cv2.COLOR_BGR2GRAY)
    sift = cv2.SIFT_create()
    kp1, des1 = sift.detectAndCompute(gray1, None)
    kp2, des2 = sift.detectAndCompute(gray2, None)
    bf = cv2.BFMatcher(cv2.NORM_L2, crossCheck=False)
    matches = bf.knnMatch(des1, des2, k=2)
    good_matches = []
    for m, n in matches:
        if m.distance < ratio_threshold * n.distance:
            good_matches.append(m)
    if len(good_matches) < min_good_matches:
        logger.info("Not enough good matches found. Returning IoU of 0.")
        return 0.0
    src_pts = np.float32([kp1[m.queryIdx].pt for m in good_matches]).reshape(-1, 1, 2)
    dst_pts = np.float32([kp2[m.trainIdx].pt for m in good_matches]).reshape(-1, 1, 2)
    M, mask = cv2.findHomography(dst_pts, src_pts, cv2.RANSAC, 5.0)
    inliers = mask.ravel().sum()
    if inliers < min_inliers:
        logger.info("Not enough inliers found. Returning IoU of 0.")
        return 0.0
    h, w, _ = image1.shape
    warped_image2 = cv2.warpPerspective(image2, M, (w, h))
    intersection = cv2.bitwise_and(image1, warped_image2)
    union = cv2.bitwise_or(image1, warped_image2)
    # need it to measure the percentage of difference between 2 images
    iou = (np.sum(intersection > 0) / np.sum(union > 0)) * 100
    return iou

those pictures, where they overlap, look the same. the only difference is the area of ground they cover.

an IoU should give you the fraction of coverage of the closer zoom over the wider zoom. eyeballing this, it might be around 50% or less.

you should draw the image bounds of each image in the other’s view, assuming you have a homography. that’s a quad/4-polygon for each.

just… debug your code, step through it, look at the data you have at every step. visualize everything.

I think I understood what the problem (I visualized the steps and there is good match of features between the images and then its goes wrong). But I still didn’t understand how to solve it, could you elaborate on the solution in your opinion?
Do you have a good source that I can understand from us?
Thank you!

please provide the homography matrix you’re getting.

if the forum will let you, please also post a pair of pictures that are good for reproducing the issue. I am hesitant to base any investigation on the composite/screenshot in your first post.

thank you for your response!
on the images with the zoom between them (the pictures are above) this the homography matrix -

[[ 1.59292577e+00  2.62401187e-02 -3.88218698e+02]
 [-1.10673356e-02  1.59030841e+00 -1.98918650e+02]
 [ 1.42779147e-05  1.75936786e-05  1.00000000e+00]]

this images with good iou calculate (38%)

and this is their homography matrix -

[[ 9.28865579e-01 -4.65555462e-01  2.35448144e+02]
 [ 4.77404543e-01  9.19555927e-01 -7.01308649e+02]
 [ 1.92723649e-05  3.06340713e-05  1.00000000e+00]]

the matrices look plausible.

the first shows zoom and some translation.
the second shows some zoom, some rotation, some translation.

you try to calculate the intersection and union using masks. at least two issues:

  • image1/image2 are image data, not masks. your results in intersection and union may show some activity in the images.
  • these calculations don’t consider the entire area, only the area covered by one of the images.

I would recommend calculating these areas from geometry, i.e. using the two quads (one projected/warped, one being just the rectangle covering the whole image). intersection and union will require algorithms that handle polygons. for the area, you could use contourArea() or an equivalent algorithm.