Calculating 2D Dimensions on Object using ArUco

I am writing a program that is measuring the length and width of objects captured through images using bounding boxes on opencv. I am measuring pixels and converting the pixel measurements to centimeters.

I am using a 5x5 ArUco marker in the image as a measurement calibrator. I printed out the AruCo marker and measured it by hand to confirm it is 5x5.

However, the marker is not being measured correctly when processing the image through opencv. It is measuring as 4.4 x 4.9 cm. My first instinct was that the camera needed to be calibrated. I calibrated the camera using checkerboard calibration of 70 images and a reprojection error of 0.253 but this did not fix the problem.

I am not sure how to proceed here to fix the problem. Any ideas would be helpful.



you should anticipate questions, such as… please show your code and data.

measure the size (in mm or inches or whatever) of your marker along both axes.

“5X5” doesn’t mean length. aruco markers are made of “pixels” (modules). 5X5 means it contains 5x5 pixels (and a border).

what on earth are you doing there?

detector is entirely undefined. I don’t see a camera matrix.

you’re just drawing stuff back into the picture and getting contours. of course that’ll show the projections of any markers, and those will look as they are supposed to, if they’re angled.

aruco has pose estimation, which is what you want.

you also might want to read about “homography”… and why that only applies to plane-plane mapping.

this is simply wrong. you’ll need the focal (from the camera matrix) and the distance to estimate that

(post deleted by author)

The image was taken from 1 metre and I have the following camera matrix that was determined using checkerboard calibration:

Camera matrix:

[[492.44637035   0.         241.0083407 ]
 [  0.         494.11679009 319.04739636]
 [  0.           0.           1.        ]]

this will explain the camera matrix
(scroll down, until you reach the formulas / images)

1 Like

Pictures would be helpful. I’m not sure if I understand what you are trying to do, but if the things you are trying to measure are flat and can be placed on a piece of paper with multiple Aruco markers on it, maybe something like this would work:

  1. Paper with 4 aruco markers (one at each corner, say), flat object placed on the paper.
  2. Capture image, detect markers.
  3. Compute centroids of each marker and use, along with known 2D locations in the plane, to compute a homography.
  4. Detect object in image, and apply homography to get 2D plane coordinates.

This all assumes your camera has no distortion or you have calibrated it and are accounting for the distortion. If you objects aren’t flat / stick out from the plane, this won’t work because the homography is only accurate for the plane.

(post deleted by author)

From the image it doesn’t seem like you are dealing with much radial distortion, but if you haven’t accounted for that it still might be helpful (especially if you want high accuracy).

I suspect the reason for the inaccuracy is because your camera isn’t pointed perfectly orthogonal to the image plane, therefore you are experiencing perspective distortion.
The thickness of the object will also contribute to inaccurate measurements. I would suggest moving the phone around in the image, and if the dimensions change based on image position, that’s (probably) a perspective distortion effect.

Crackwitz mentioned using the Aruco marker pose estimation - if you go this route you can use the recovered pose (along with the camera matrix) to compute width/height estimations of your bounding box. This is probably the better approach because, for example, you can account for object thickness (if you know it) to get correct / more accurate width/height. You might do best by using a number of Aruco markers to estimate the pose for a more accurate and robust process.

I suggested using a homography, which is probably the easier approach, but it only works for objects that are in the same plane as your Aruco marker (so thin things), and you can’t directly extend it to account for depth differences. It might be a faster way to get going, but you might run into a wall pretty quickly if you want to measure things with thickness.

(Also I suggested using the centroids of multiple markers - this was for increased accuracy / robustness. You can compute a homography based on a single marker, but the detection accuracy / noise for the individual corner points might not give you the best / most stable results.)

1 Like

(post deleted by author)

I would suggest you look into 3D → 2D perspective projection / how the pinhole camera model works (and how the camera matrix relates to this). The pose describes the rotation/translation of the chessboard in the camera coordinate frame - this, along with the camera matrix, describe how to transform 3D points to image points.

Once you have a handle on that, the challenge you face is doing the inverse - transforming 2D image points to 3D points. You will have to use the camera matrix and pose (rvec,tvec), along with some other constraint (in your case the fact that the 3D point you are interested in lies on the chessboard plane) to get a 3D world coordinate from your image coordinate.