ArUco relative displacement measurement between markers - precision issues when object rotates

Hello,

I am working on a project where I need to measure the relative displacement of three ArUco markers with respect to a reference marker (ID 0). One marker fixed on the top as reference, and others placed on the bottom and sides of the same object.

For static markers I am achieving very good precision, so my pipeline seems to be working correctly in that scenario.

The problem appears when the object carrying the markers rotates. Even without any real displacement between the markers, the rotation of the object causes the markers to appear distorted/perspective-changed in the image, and my relative displacement measurement degrades significantly, I am seeing up to 5mm of apparent drift that is not real movement.

Is there a recommended approach, guide or example for computing robust relative displacement between ArUco markers that is stable under object rotation? Am I missing something fundamental in my approach?

Any advice would be greatly appreciated.

Thank you.

pictures please. 5 mm could be a lot, or very little. depends on the context.

make the issue reproducible. are you inverting any matrices?

Sorry, here I provide more info, As you can see in the attached photo, I’m using a cardboard box where I’ve placed a grid of four markers at the top as a reference and another marker at the bottom. The whole idea is to measure the exact displacement between them, but I’ve hit a wall: as soon as I rotate the box, the measurements start dancing around and changing, even though the markers obviously aren’t moving on the cardboard.

I know this is happening because of the camera’s perspective and because I do not get to compensate the rotation of the marker, but I’m struggling to find a solid way to compensate for it. I thought that by using the markers I could get a stable relative distance, but the perspective error is still messing up the values. Do you know if there’s a specific method or a way to tweak the coordinate transformation so the measurements stay the same regardless of how the box is tilted? I really need these values to be invariant to the camera angle to get any real accuracy.

First of all my idea was not use a grid of markers but I though that could improve the accuracy, but the result was not as expected, I attach my code.

void AD_Aruco_Detection::vfn_ProcessFrame(

cv::Mat&                                     _mat_frame,

const std::vector<std::vectorcv::Point2f>& _vec_corners,

const std::vector<int32_t>&                  _vec_ids)

{

cv::Mat mat_display;

_mat_frame.copyTo(mat_display);

if (_vec_ids.empty())

{

cv::imshow(“ArUco Detection”, mat_display);

return;

}

// Refinar esquinas a subpixel

std::vector<std::vectorcv::Point2f> vec_cornersRefined = _vec_corners;

cv::Mat mat_gray;

cv::cvtColor(_mat_frame, mat_gray, cv::COLOR_BGR2GRAY);

for (auto& corner : vec_cornersRefined)

{

cv::cornerSubPix(mat_gray,

corner,

cv::Size(5, 5),

cv::Size(-1, -1),

cv::TermCriteria(cv::TermCriteria::EPS + cv::TermCriteria::MAX_ITER,

30, 0.01));

}

// ── 1. POSE DEL BOARD (frente) ───────────────────────────────────────────────

cv::Mat mat_objPoints_board;

cv::Mat mat_imgPoints_board;

m_board_forehead.matchImagePoints(vec_cornersRefined,

_vec_ids,

mat_objPoints_board,

mat_imgPoints_board);

if (mat_objPoints_board.empty() || mat_objPoints_board.rows < 4)

{

cv::putText(mat_display, “Forehead board not found”,

cv::Point(10, 30), cv::FONT_HERSHEY_SIMPLEX,

0.7, cv::Scalar(0, 0, 255), 2);

cv::imshow(“ArUco Detection”, mat_display);

return;

}

cv::Vec3d vec_boardRvec;

cv::Vec3d vec_boardTvec;

cv::solvePnP(mat_objPoints_board,

mat_imgPoints_board,

m_mat_cameraMatrix,

m_mat_distCoeffs,

vec_boardRvec,

vec_boardTvec,

false,

cv::SOLVEPNP_ITERATIVE);

// Construir T_board (4x4)

cv::Mat mat_R_board;

cv::Rodrigues(vec_boardRvec, mat_R_board);

cv::Mat mat_T_board = cv::Mat::eye(4, 4, CV_64F);

mat_R_board.copyTo(mat_T_board(cv::Rect(0, 0, 3, 3)));

mat_T_board.at(0, 3) = vec_boardTvec[0];

mat_T_board.at(1, 3) = vec_boardTvec[1];

mat_T_board.at(2, 3) = vec_boardTvec[2];

// ── 2. POSE DE LOS MARKERS DE MANDÍBULA (ID >= 4) ────────────────────────

std::vectorcv::Point3f vec_objPoints = {

cv::Point3f(-m_f32_markerLengthM / 2.0f,  m_f32_markerLengthM / 2.0f, 0.0f),

cv::Point3f( m_f32_markerLengthM / 2.0f,  m_f32_markerLengthM / 2.0f, 0.0f),

cv::Point3f( m_f32_markerLengthM / 2.0f, -m_f32_markerLengthM / 2.0f, 0.0f),

cv::Point3f(-m_f32_markerLengthM / 2.0f, -m_f32_markerLengthM / 2.0f, 0.0f)

};

std::vector<int32_t>   vec_jawIds;

std::vectorcv::Vec3d vec_jawRvecs;

std::vectorcv::Vec3d vec_jawTvecs;

for (uint32_t u32_i = 0U; u32_i < _vec_ids.size(); u32_i++)

{

const int32_t i32_id = _vec_ids[u32_i];

if (i32_id < 4) { continue; } // IDs 0-3 son el board de referencia

const bool bln_useGuess = m_map_prevRvec.find(i32_id) != m_map_prevRvec.end();

cv::Vec3d vec_rvec = bln_useGuess ? m_map_prevRvec[i32_id] : cv::Vec3d(0, 0, 0);

cv::Vec3d vec_tvec = bln_useGuess ? m_map_prevTvec[i32_id] : cv::Vec3d(0, 0, 0);

cv::solvePnP(vec_objPoints,

vec_cornersRefined[u32_i],

m_mat_cameraMatrix,

m_mat_distCoeffs,

vec_rvec,

vec_tvec,

bln_useGuess,

cv::SOLVEPNP_ITERATIVE);

m_map_prevRvec[i32_id] = vec_rvec;

m_map_prevTvec[i32_id] = vec_tvec;

vec_jawIds.push_back(i32_id);

vec_jawRvecs.push_back(vec_rvec);

vec_jawTvecs.push_back(vec_tvec);

}

vfn_DrawResults(mat_display, vec_cornersRefined, _vec_ids,

mat_T_board, vec_jawIds, vec_jawRvecs, vec_jawTvecs);

}

void AD_Aruco_Detection::vfn_DrawResults(

cv::Mat&                                     _mat_display,

const std::vector<std::vectorcv::Point2f>& _vec_corners,

const std::vector<int32_t>&                  _vec_ids,

const cv::Mat&                               _mat_T_board,

const std::vector<int32_t>&                  _vec_jawIds,

const std::vectorcv::Vec3d&                _vec_jawRvecs,

const std::vectorcv::Vec3d&                _vec_jawTvecs)

{

cv::aruco::drawDetectedMarkers(_mat_display, _vec_corners, _vec_ids);

// Dibujar ejes del board de referencia

cv::Vec3d vec_boardRvec, vec_boardTvec;

cv::Mat   mat_R = _mat_T_board(cv::Rect(0, 0, 3, 3));

cv::Rodrigues(mat_R, vec_boardRvec);

vec_boardTvec = cv::Vec3d(_mat_T_board.at(0, 3),

_mat_T_board.at(1, 3),

_mat_T_board.at(2, 3));

cv::drawFrameAxes(_mat_display,

m_mat_cameraMatrix,

m_mat_distCoeffs,

vec_boardRvec,

vec_boardTvec,

m_f32_markerLengthM * 1.5f);

// Invertir T_board correctamente (sin .inv() numérico)

// R^-1 = R^T para matrices de rotación

cv::Mat mat_R_inv = mat_R.t();

cv::Mat mat_t     = (cv::Mat_(3,1) << _mat_T_board.at(0,3),

_mat_T_board.at(1,3),

_mat_T_board.at(2,3));

cv::Mat mat_t_inv = -mat_R_inv * mat_t;

cv::Mat mat_T_board_inv = cv::Mat::eye(4, 4, CV_64F);

mat_R_inv.copyTo(mat_T_board_inv(cv::Rect(0, 0, 3, 3)));

mat_T_board_inv.at(0, 3) = mat_t_inv.at(0);

mat_T_board_inv.at(1, 3) = mat_t_inv.at(1);

mat_T_board_inv.at(2, 3) = mat_t_inv.at(2);

// Calcular desplazamiento relativo de cada marker de mandíbula

for (uint32_t u32_i = 0U; u32_i < _vec_jawIds.size(); u32_i++)

{

const int32_t i32_id = _vec_jawIds[u32_i];

cv::drawFrameAxes(_mat_display,

m_mat_cameraMatrix,

m_mat_distCoeffs,

_vec_jawRvecs[u32_i],

_vec_jawTvecs[u32_i],

m_f32_markerLengthM * 1.5f);

// Construir TN del marker mandíbula

cv::Mat mat_RN;

cv::Rodrigues(_vec_jawRvecs[u32_i], mat_RN);

cv::Mat mat_TN = cv::Mat::eye(4, 4, CV_64F);

mat_RN.copyTo(mat_TN(cv::Rect(0, 0, 3, 3)));

mat_TN.at(0, 3) = _vec_jawTvecs[u32_i][0];

mat_TN.at(1, 3) = _vec_jawTvecs[u32_i][1];

mat_TN.at(2, 3) = _vec_jawTvecs[u32_i][2];

// Desplazamiento en sistema de coordenadas del board (cráneo)

const cv::Mat mat_T_rel = mat_T_board_inv * mat_TN;

const double f64_dxRaw = mat_T_rel.at(0, 3) * 1000.0;

const double f64_dyRaw = mat_T_rel.at(1, 3) * 1000.0;

const double f64_dzRaw = mat_T_rel.at(2, 3) * 1000.0;

// EMA sobre el resultado final

if (m_map_dxEma.find(i32_id) == m_map_dxEma.end())

{

m_map_dxEma[i32_id] = f64_dxRaw;

m_map_dyEma[i32_id] = f64_dyRaw;

m_map_dzEma[i32_id] = f64_dzRaw;

}

else

{

m_map_dxEma[i32_id] = f64_EMA_ALPHA * f64_dxRaw + (1.0 - f64_EMA_ALPHA) * m_map_dxEma[i32_id];

m_map_dyEma[i32_id] = f64_EMA_ALPHA * f64_dyRaw + (1.0 - f64_EMA_ALPHA) * m_map_dyEma[i32_id];

m_map_dzEma[i32_id] = f64_EMA_ALPHA * f64_dzRaw + (1.0 - f64_EMA_ALPHA) * m_map_dzEma[i32_id];

}

const int32_t i32_dx = static_cast<int32_t>(m_map_dxEma[i32_id]);

const int32_t i32_dy = static_cast<int32_t>(m_map_dyEma[i32_id]);

const int32_t i32_dz = static_cast<int32_t>(m_map_dzEma[i32_id]);

std::cout << std::fixed << std::setprecision(2)

<< i32_id              << “;”

<< m_map_dxEma[i32_id] << “;”

<< m_map_dyEma[i32_id] << “;”

<< m_map_dzEma[i32_id]

<< std::endl;

// Buscar la esquina del marker en imagen para poner el texto

cv::Point2f pt_text(0, 0);

for (uint32_t u32_j = 0U; u32_j < _vec_ids.size(); u32_j++)

{

if (_vec_ids[u32_j] == i32_id)

{

pt_text = _vec_corners[u32_j][0];

break;

}

}

std::string str_info = “ID:” + std::to_string(i32_id) +

" dX:" + std::to_string(i32_dx) +

" dY:" + std::to_string(i32_dy) +

" dZ:" + std::to_string(i32_dz);

cv::putText(_mat_display,

str_info,

cv::Point(static_cast<int32_t>(pt_text.x),

static_cast<int32_t>(pt_text.y) - 10),

cv::FONT_HERSHEY_SIMPLEX,

0.5,

cv::Scalar(0, 0, 255),

2);

}

cv::imshow(“ArUco Detection”, _mat_display);

}




grids make no sense for this. think “radio telescope”. if they could have one big dish, they would take it. they can’t, so they use many small dishes.

you can continue to use your 2x2 grid. it’s very unlikely to be the problem.

I have skimmed the C++ code. something mangled the code: all the quote characters were turned into typographic quotes (syntactically incompatible with the language), the code lost all of its indentation, every line has a blank line after it. I see what looks like matrix inversion, but it’s needlessly complicated, and I can’t tell if it’s even used properly. C++ code does that. it’s bad for prototyping. I would recommend Python instead.

marker detection gives you marker poses relative to the camera. those are given to you as R and t.

compose each R and t into one 4x4 pose matrix. I see that you do that. good.

then you can calculate one marker’s pose relative to another marker:

\begin{align*} ^{m_1}T_{m_2} &= ^{m_1}T_{C} &\cdot~ ^{C}T_{m_2} \\ &= (^{C}T_{m_1})^{-1} &\cdot~ ^{C}T_{m_2} \end{align*}

That’s the math. Straight-forward to implement.

def matrix_from_Rt(R, t):
    M = np.eye(4)
    M[0:3,0:3] = R
    M[0:3,3] = t
    return M

T_C_m1 = matrix_from_Rt(R1, t1)
T_C_m2 = matrix_from_Rt(R2, t2)
T_m1_m2 = np.linalg.inv(T_C_m1) @ T_C_m2

And that will be stable, upto measurement error.

for debugging, you could place the originally generated marker images on another blank image (with GIMP or inkscape?), and pretend that your camera saw this. it won’t have out-of-plane rotation, but you can rotate in-plane.

when you point your camera at the markers, roll the camera a little. if the math or the implementation has an issue, then you’ll see the relative translation vary from that. if all is right, relative translation won’t vary.