Trying to calculate distance and direction between an ArUco marker and two cameras

First a quick question. If I specify my marker length in meters when I call aruco.estimatePoseSingleMarkers , should I expect to be able to call np.linalg.norm on a resulting translation vector to get the distance to the marker in meters?

If not, then that probably answers the rest of my question, and I mainly need to know how to get distance in meters.

If that is supposed to work, then this is my question:

I have two cameras and an ArUco marker, and I’m trying to determine the distance and direction from one camera to the marker and to the other camera. I’m not getting the answers I expect, but I don’t know which steps I have wrong or what steps I might be missing.

I describe my steps in detail below, but the tl;dr is

  1. Calibrate to find camera intrinsics
  2. Place the marker and aim the cameras so the marker is in the center of each camera’s frame.
  3. Use aruco.estimatePoseSingleMarkers to find the rvec and tvec for each camera
  4. compute each camera’s distance to the marker as np.linalg.norm(tvec)
  5. convert each camera’s rvec and tvec into transformation matrixes, invert one, and multiply it by the other to get the transformation matrix from one camera to the other.
  6. extract the rvec and tvec from the transformation matrix.
  7. compute the distance between the two cameras as np.linalg.norm(tvec)
    The calculated result is 0.79 meters, but the measured distance is 1.28 meters.

I don’t need to be super precise. I’ll be happy if I can consistently get within 10cm.

Creating marker board
I have a 4x6 ChArUco calibration board using the DICT_4X4_250 dictionary. The width of the black squares is 38mm and the width of the markers is 24mm. I chose this size because I’m printing on US letter paper. After printing, I measured the markers with ruler to make sure the dimensions are what I expect.

Another option would be a 3x4 board with 60mm squares and 42mm markers, which would mean fewer markers and chessboard corners, but I could capture calibration photos from further away, so I’m not sure which would be better or how much difference it would make, but I’ll try it out after I post this.

Collecting calibration images
I create my board object like this:

import cv2
from cv2 import aruco
board = aruco.CharucoBoard_create(
    4, 6, 0.038, 0.024, aruco.Dictionary_get(aruco.DICT_4X4_250))

Then I collected 350 images at 1280x720 of the board from various positions and at various angles. I was sure to use use a lot of light, to capture images with the board in every part of the screen at multiple distances and angles. The distances from the camera ranged from as close as I could get and still fit the entire board on the screen to as far away as I could get and still recognize the markers.

I call aruco.drawDetectedMarkers and cv2.drawChessboardCorners so I can see what I’m doing, and I only save the frames where cv2.aruco.detecMarkers returns the expected number of marker id’s. There’s also a counter running up in the corner that tells me how many frames I’ve captured so far.

Running the calibration algorithm
My calibration code is the same as everybody else’s.

  1. convert to grayscale
  2. detect the markers to get the marker corners and ids
  3. estimate the subpixels
  4. interpolate the Charuco corners to get the chessboard corners and ids
  5. append the chessboard corners and ids to lists
  6. call aruco.calibrateCameraCharucoExtended, using grayscale.shape as imageSize

My resulting camera matrix is

1496.302124    0.000000  367.110484
   0.000000 1508.521808  599.689703
   0.000000    0.000000    1.000000

and distortion coefficients are -0.315128, 0.342943, 0.008561, -0.022142, -0.184709

The reprojection error is about 1.16, and I’m told that it shouldn’t be higher that 0.5, so that might be my problem, but I don’ t know how to make it lower.

The camera matrix is different from what I expect given the manufacturer’s specifications, but maybe my expectations are wrong because I don’t get good results with this either (or maybe my problem is further down). If the frame resolution is 1280x720, the horizontal field of view is 61.8 degrees, and the vertical field of view is 37.2 degrees, then I expect to be able to build the camera matrix like this:

import numpy as np

cx = 1280/2
cy = 720/2
fx = cx / np.tan(np.radians(30.9))
fy = cy / np.tan(np.radians(18.6))
camera_matrix = np.array([
    [fx, 0., cx],
    [0., fy, cy],
    [0., 0., 1.]])

Which gives me

1069.362076     0.000000  640.000000
    0.000000 1069.718355  360.000000
    0.000000    0.000000    1.000000

I might have the x’s and y’s swapped since numpy arrays and opencv matrices are in column major order, but I tried it the other way and also couldn’t get my distance calculations to come out right.

Placing the marker
My cameras are mounted 1.70 meters above the floor and they are 1.28 meters apart. I place a 57mm wide marker in the frame. From Camera 1 to the marker is 0.64 meters to the right, 2.50 meters forward, and 0.75 meters down. From Camera 2 to the marker is 0.64 meters to the left, 2.50 meters forward, and 0.75 meters down, for a distance of 2.69 meters from each camera.

I then center the marker in each camera’s view. These are pan/tilt/zoom cameras, so I can just find the center of the marker on the screen and use the camera’s API to center on those coordinates. The zoom value is at 1x for this entire test.

For a second test, I also placed the marker so that it was 0.64 right, 1.25 meters forward, and 0.9 meters from Camera 1, and 0.64 left, 1.25 meters forward, and 0.9 meters from Camera 2, for a distance of 1.67 meters from each camera.

Computing distance and direction from the cameras to the markers
For each camera, I convert the image to grayscale, detect the marker and corners, do the subpixel estimation, and call aruco.estimatePoseSingleMarkers to find the rvec and the tvec.

For the first test, Camera 1 rvec and tvec are [-2.827585, 0.137469, -0.264872] and [0.600194, -0.529501, 3.143780], for a distance of 3.24m vs the expected value of 2.69m.

Camera 2’s rvec and tvec are [3.134496, 0.050128, -1.055368] and [0.601065, -0.526729, 3.180449], for a distance of 3.28 vs the expected value of 2.69.

For the second test, Camera 1 rvec and tvec are [2.279971, -0.476873, 0.021821] and [0.387092, -0.328055, 2.047221], for a distance of 2.11m vs the expected value of 1.67m.

Camera 2’s rvec and tvec are [-1.984182, -0.400866, -0.147234] and [0.395038, -0.330549, 2.054574], for a distance of 2.12 vs the expected value of 1.67m.

Compute distance and direction between the cameras
I have these functions for working with transformation matrices

def build_transformation_matrix(rvec, tvec):
    trans_mat = np.diag(np.ones(4))
    rot_mat, _ = cv2.Rodrigues(rvec)
    trans_mat[:3,:3]= rot_mat.transpose()
    trans_mat[:3,3] = tvec
    return trans_mat
def invert_transformation_matrix(trans_mat):
    inverse = np.diag(np.ones(4))
    rot = np.linalg.inv(trans_mat[:3,:3])
    inverse[:3,:3] = rot
    inverse[:3,3] = -(rot @ trans_mat[:3,3])
    return inverse

def transform(tvec, trans_mat):
    tvec_ = tvec[np.newaxis].transpose()
    points = np.vstack((tvec_, np.array([[1]])))
    points_trans = trans_mat @ points
    return (np.true_divide(points_trans[:3,:], points_trans[[-1], :])).transpose()

This seems to be correct, because if I do this

trans_mat = build_transformation_matrix(rvec, tvec)
inverse_trans_mat = invert_transformation_matrix(trans_mat)
print(transform(tvec, inverse_trans_mat)

I end up with [0, 0, 0]

I can make a transformation matrix between the cameras and compute the distance like this:

cam1_matrix = build_transformation_matrix(rvec1, tvec1)
cam2_matrix = build_transformation_matrix(rvec2, tvec2)
cam1_cam2_matrix = invert_transformation_matrix(cam1_matrix) @ cam2_matrix
cam1_cam2_rot = cam1_cam2_matrix[:3,:3]
cam1_cam2_tvec = cam1_cam2_matrix[:3,3]
distance = np.linalg.norm(cam1_cam2_tvec)

But the numbers I get here aren’t even close to correct, and I don’t know if the problem is that what I’m trying to do here isn’t valid, if I have accumulated so many errors by this point that even a correct method can’t get me close to the right answer, or both, but it’s probably both.

yes. the tvec itself should give you the position of the marker’s center (aruco uses marker centers) relative to the camera’s frame. you should see a clear correspondence between the distances in z as well as x,y directions

your calculated intrinsics look plausible… the measurement looks odd, specifically cx,cy do. fx,fy being off by that much from expected/calculated could be because the camera actually shows you a cropped picture relative to the full resolution FoV.

the list of steps looks good too. I didn’t dig into your description of each step.

that amounts to return np.linalg.inv(trans_mat)

check the result against np.linalg.inv() to be sure. this decomposition and multiplication in parts may be numerically more stable but I always avoid it because it’s obscure.

also: the rotation part can be transposed instead of inverted. that’s why people do this ceremony, to avoid the actual inversion.

I’ve tested with the manually calculated intrinsics and a few versions of the ones generated by the calibration procedure. I do see a correspondence between the calculated distances and the actual distance, but it’s off by quite a bit. The error seems to increase as the marker moves closer to the camera.

My invert_transformation matrix gives the same answer as np.linalg.inv(), so I’ll use that instead. That was just some code I copied from somewhere. My linear algebra has been rusting in a shed for about 30 years.

I’d also recommend a naming convention for your matrix variables that mirrors the math.

^\text{result frame} M_\text{input frame}

so if you compose them, frame names butt against each other like with dominos.

something like M_output_input maybe, so cam1_matrix would be M_cam1_marker because it moves marker-local coordinates into the camera frame.

M_marker_cam1 = inv(M_cam1_marker)
M_cam2_cam1 = M_cam2_marker @ M_marker_cam1 # == M_cam2_marker @ inv(M_cam1_marker)

I think you may have an issue in your code where you do something like that