Recover pose between two cameras using Aruco markers

Greetings All,

Scenario: We have kept two cameras (Cf and Cb) in the front and back of a room nearly facing each other. Both are the same model with the known camera intrinsics. Now we placed four Aruco markers, visible from both the cameras.
(a) - two of the markers (M1, M2) on the centre table, parallel to the ground plane.
(b) - two of them (M3, M4) on the Left and Right walls.

Approach A: From both Cf and Cb images, we recover the positions of all the markers. We made PTS_Cf (16,2 array) and PTS_Cb (16,2 array) and calculated the Essential Matrix (findEssentialMat()). From the Essential Matrix (decomposeEssentialMat() / recoverPose()) we calculated the R,t vectors wrt both Cf and Cb.
The t vector is a Unit vector.

Approach B: We calculated R, t and the Projection Matrix for all the four Aruco markers (estimatePoseSingleMarkers). From the Projection Matrix for individual markers we calculated the Essential Matrix and subsequently the R, t values wrt both Cf and Cb.

(1) Please comment whether our “Approaches” are correct?

(2) Can we use the Epipolar Geometry for nearly opposite cameras, to find the relative positions?

(3) The computed t is Unit Vector. How could we get the actual translation values?

(4) For Approach B, when we are calculating the individual R, t wrt to both Cf and Cb, the values changes wrt to marker position. As per our understanding, they should have been constant regardless of the four Aruco marker positions. Any suggestions?

(5) Any other approach we should try?

that matrix transforms from the marker frame into the camera frame. it’s not a projection matrix.

it might be less academic but you could just go with these pose matrices and compose them.

\begin{align*} ^{C_f} T_{C_b} & = ~ ^{C_f} T_{M_1} \cdot ~^{M_1} T_{C_b} \\ & = ~ ^{C_f} T_{M_1} \cdot (^{C_b} T_{M_1})^{-1} \\ \end{align*}
1 Like

Thank you @crackwitz
But, I was wondering that since(Cf)T(M1) and (Cb)T(M1) are both 1x3 matrix, and
our expected (Cf)T(Cb) is also a 1x3, how could we do matrix multiplication.
As multiplied output could be 1x1 or 3x3. Any suggestions?

Alternate Implementation I did

  1. cf(T)m1 has R & t vector of (R1,t1) and cb(T)m1 has R & t vector of (R2,t2)

  2. Built (4,4) matrix with cf(T)m1 and cb(T)m1 as M_cam1_marker, M_cam2_marker
    [ [R00, R01, R02, t00],
    [R10, R11, R12, t01],
    [R20, R21, R22, t02],
    [0, 0, 0, 1], ]

  3. Inverted M_cam1_marker using np.linalg.inv function.

  4. Finally created, M_cam2_cam1 = M_cam2_marker @ ((M_cam1_marker)-1) receiving the R and t for cf(M)cb

Please suggest if the above steps are fine.


no, I meant 4x4 matrices.

they are built from rvec (Rodrigues into 3x3 matrix) and tvec (3x1 vector)

those look fine.

does the result look plausible, numerically? the translation part ought to be verifiable by eye/tape measure.

Appreciate your affirmation.
A few things that I am observing

  1. The measurement are not accurate but closer to what I have expected especially. t. The inaccuracy could be due to my camera calibration and I am working on it.

  2. I have 3 common markers (visible from front and back camera) and one of it I have put it on the ground and two pasted on left and right wall. My assumption is for all 3 markers the Cf(M)Cb (R,t) should be similar. Is my understanding correct? I will take the reading post my calibrations improvement.

Will keep you posted


blame inaccuracies on tiny markers. the smaller the marker, the noisier the rotation estimate.

also, people never seem to check the actual size of their markers, and that it’s actually square. printers LIE.

I’d agree.

I have no idea how to properly use all data at once to calculate one good result.

1 Like

Yes, agreed observing the same. Marker size and accuracy of marker dimension has impact. I am using a 120 degree FOV camera and was having impact on calibration especially to the marker kept towards the corners. With better camera calibration, getting values closer to the measurement.

Working on usefulness of the data to interpret.