First a quick question. If I specify my marker length in meters when I call aruco.estimatePoseSingleMarkers , should I expect to be able to call np.linalg.norm on a resulting translation vector to get the distance to the marker in meters?
If not, then that probably answers the rest of my question, and I mainly need to know how to get distance in meters.
If that is supposed to work, then this is my question:
I have two cameras and an ArUco marker, and I’m trying to determine the distance and direction from one camera to the marker and to the other camera. I’m not getting the answers I expect, but I don’t know which steps I have wrong or what steps I might be missing.
I describe my steps in detail below, but the tl;dr is
- Calibrate to find camera intrinsics
- Place the marker and aim the cameras so the marker is in the center of each camera’s frame.
- Use aruco.estimatePoseSingleMarkers to find the rvec and tvec for each camera
- compute each camera’s distance to the marker as np.linalg.norm(tvec)
- convert each camera’s rvec and tvec into transformation matrixes, invert one, and multiply it by the other to get the transformation matrix from one camera to the other.
- extract the rvec and tvec from the transformation matrix.
- compute the distance between the two cameras as np.linalg.norm(tvec)
The calculated result is 0.79 meters, but the measured distance is 1.28 meters.
I don’t need to be super precise. I’ll be happy if I can consistently get within 10cm.
Creating marker board
I have a 4x6 ChArUco calibration board using the DICT_4X4_250 dictionary. The width of the black squares is 38mm and the width of the markers is 24mm. I chose this size because I’m printing on US letter paper. After printing, I measured the markers with ruler to make sure the dimensions are what I expect.
Another option would be a 3x4 board with 60mm squares and 42mm markers, which would mean fewer markers and chessboard corners, but I could capture calibration photos from further away, so I’m not sure which would be better or how much difference it would make, but I’ll try it out after I post this.
Collecting calibration images
I create my board object like this:
import cv2 from cv2 import aruco board = aruco.CharucoBoard_create( 4, 6, 0.038, 0.024, aruco.Dictionary_get(aruco.DICT_4X4_250))
Then I collected 350 images at 1280x720 of the board from various positions and at various angles. I was sure to use use a lot of light, to capture images with the board in every part of the screen at multiple distances and angles. The distances from the camera ranged from as close as I could get and still fit the entire board on the screen to as far away as I could get and still recognize the markers.
I call aruco.drawDetectedMarkers and cv2.drawChessboardCorners so I can see what I’m doing, and I only save the frames where cv2.aruco.detecMarkers returns the expected number of marker id’s. There’s also a counter running up in the corner that tells me how many frames I’ve captured so far.
Running the calibration algorithm
My calibration code is the same as everybody else’s.
- convert to grayscale
- detect the markers to get the marker corners and ids
- estimate the subpixels
- interpolate the Charuco corners to get the chessboard corners and ids
- append the chessboard corners and ids to lists
- call aruco.calibrateCameraCharucoExtended, using grayscale.shape as imageSize
My resulting camera matrix is
1496.302124 0.000000 367.110484 0.000000 1508.521808 599.689703 0.000000 0.000000 1.000000
and distortion coefficients are -0.315128, 0.342943, 0.008561, -0.022142, -0.184709
The reprojection error is about 1.16, and I’m told that it shouldn’t be higher that 0.5, so that might be my problem, but I don’ t know how to make it lower.
The camera matrix is different from what I expect given the manufacturer’s specifications, but maybe my expectations are wrong because I don’t get good results with this either (or maybe my problem is further down). If the frame resolution is 1280x720, the horizontal field of view is 61.8 degrees, and the vertical field of view is 37.2 degrees, then I expect to be able to build the camera matrix like this:
import numpy as np cx = 1280/2 cy = 720/2 fx = cx / np.tan(np.radians(30.9)) fy = cy / np.tan(np.radians(18.6)) camera_matrix = np.array([ [fx, 0., cx], [0., fy, cy], [0., 0., 1.]])
Which gives me
1069.362076 0.000000 640.000000 0.000000 1069.718355 360.000000 0.000000 0.000000 1.000000
I might have the x’s and y’s swapped since numpy arrays and opencv matrices are in column major order, but I tried it the other way and also couldn’t get my distance calculations to come out right.
Placing the marker
My cameras are mounted 1.70 meters above the floor and they are 1.28 meters apart. I place a 57mm wide marker in the frame. From Camera 1 to the marker is 0.64 meters to the right, 2.50 meters forward, and 0.75 meters down. From Camera 2 to the marker is 0.64 meters to the left, 2.50 meters forward, and 0.75 meters down, for a distance of 2.69 meters from each camera.
I then center the marker in each camera’s view. These are pan/tilt/zoom cameras, so I can just find the center of the marker on the screen and use the camera’s API to center on those coordinates. The zoom value is at 1x for this entire test.
For a second test, I also placed the marker so that it was 0.64 right, 1.25 meters forward, and 0.9 meters from Camera 1, and 0.64 left, 1.25 meters forward, and 0.9 meters from Camera 2, for a distance of 1.67 meters from each camera.
Computing distance and direction from the cameras to the markers
For each camera, I convert the image to grayscale, detect the marker and corners, do the subpixel estimation, and call aruco.estimatePoseSingleMarkers to find the rvec and the tvec.
For the first test, Camera 1 rvec and tvec are [-2.827585, 0.137469, -0.264872] and [0.600194, -0.529501, 3.143780], for a distance of 3.24m vs the expected value of 2.69m.
Camera 2’s rvec and tvec are [3.134496, 0.050128, -1.055368] and [0.601065, -0.526729, 3.180449], for a distance of 3.28 vs the expected value of 2.69.
For the second test, Camera 1 rvec and tvec are [2.279971, -0.476873, 0.021821] and [0.387092, -0.328055, 2.047221], for a distance of 2.11m vs the expected value of 1.67m.
Camera 2’s rvec and tvec are [-1.984182, -0.400866, -0.147234] and [0.395038, -0.330549, 2.054574], for a distance of 2.12 vs the expected value of 1.67m.
Compute distance and direction between the cameras
I have these functions for working with transformation matrices
def build_transformation_matrix(rvec, tvec): trans_mat = np.diag(np.ones(4)) rot_mat, _ = cv2.Rodrigues(rvec) trans_mat[:3,:3]= rot_mat.transpose() trans_mat[:3,3] = tvec return trans_mat def invert_transformation_matrix(trans_mat): inverse = np.diag(np.ones(4)) rot = np.linalg.inv(trans_mat[:3,:3]) inverse[:3,:3] = rot inverse[:3,3] = -(rot @ trans_mat[:3,3]) return inverse def transform(tvec, trans_mat): tvec_ = tvec[np.newaxis].transpose() points = np.vstack((tvec_, np.array([]))) points_trans = trans_mat @ points return (np.true_divide(points_trans[:3,:], points_trans[[-1], :])).transpose()
This seems to be correct, because if I do this
trans_mat = build_transformation_matrix(rvec, tvec) inverse_trans_mat = invert_transformation_matrix(trans_mat) print(transform(tvec, inverse_trans_mat)
I end up with [0, 0, 0]
I can make a transformation matrix between the cameras and compute the distance like this:
cam1_matrix = build_transformation_matrix(rvec1, tvec1) cam2_matrix = build_transformation_matrix(rvec2, tvec2) cam1_cam2_matrix = invert_transformation_matrix(cam1_matrix) @ cam2_matrix cam1_cam2_rot = cam1_cam2_matrix[:3,:3] cam1_cam2_tvec = cam1_cam2_matrix[:3,3] distance = np.linalg.norm(cam1_cam2_tvec)
But the numbers I get here aren’t even close to correct, and I don’t know if the problem is that what I’m trying to do here isn’t valid, if I have accumulated so many errors by this point that even a correct method can’t get me close to the right answer, or both, but it’s probably both.