solvePnP's translation coordinates are from camera's lens or from the sensor?

Title. I’m getting the camera coordinates relative to a 2D marker positioned at (0,0,0).

I’m trying to see how precise it is, so I’m only measuring the depth (Z coordinate), aligning the other two axis, and I’ve noticed that it’s consistently calculating about 0.6 cm over the real distance to the camera’s lens. Is it because the coordinates are those of exactly the camera’s sensor, or is there something else creating this error?

If it is the sensor, then, does OpenCV have any functionality to estimate the focal length in metric system? I unfortunately don’t have the real parameters of the camera I’m using.

Edit: My first measurements were incomplete. It is a consistent 0.6 cm overestimation at 15-20 cm. From 20 to 30 cm the overestimation lowers to about 0.3-0.4 cm gradually, and from there it stays the same up to 50 cm, which is the max distance I can measure right now.

Give or take a millimeter, since these were done manually with ruler and triangle.

Maybe I’m wrong, but I think most of these methods use the pinhole camera model: the camera’s view is a pyramid and the camera is considered as the top vertex of the pyramid.
That means that the camera position is the optical center of the lens, which is Fmm in front of the sensor, where F ~ the focal length of the lens.

pinhole model, yes.

the optical center is in the middle of the pin hole. it doesn’t matter where the projection plane (sensor) actually is, or what “focal distance” the pinhole camera has. both just translate into scale factors that relate the field of view (tangens of that) to the sensor’s number of pixels.

if you get funny values, I’d blame that on the lens system. check if your values are consistent at least.

I’m trying to find a proper illustration for this… I might have been mistaken in a previous edit. I still believe the optical center, even for a lens, is right in the middle of the lens.

for a lens system the situation becomes more complex. your camera might not have a single lens.

here’s a really good resource:
https://ciechanow.ski/cameras-and-lenses/

Just in case, I’ve checked all the measures again, and I was slightly mistaken at first.
It seems that around 15-20 cm it overestimates by about 0.6 cm; from 20 to 30 cm the overestimation lowers down gradually to 0.3-0.4 cm and stays like that up to 50 cm which is my max distance. Give or take a millimeter since I’m measuring manually with ruler and triangle.

Could this be caused by the lens’ distortion? Should I apply the undistort operations before calculating the translation vector, or is that already done in solvePnP (since it does take the distortion coefficients)?

I’d trust that it’ll use the coefficients, if it takes them.

sources of error tend to come from rounding to whole pixels, and from the difficulty of determining subpixel precise coordinates. cameras love to sharpen their output images. that destroys all kinds of information. and then, black-white edges, in a gamma-compressed color space, have a different shape than they have in a linear color space, but most algorithms don’t care about these intricacies.

then you have blurriness. whatever finds these tags probably just does a threshold. that won’t give you edges where they’re supposed to be, but somewhere off, because of the blur and threshold.

I’m using an aruco marker and aruco detection and it does have trouble determining the perimeter. Even with the camera and marker both completely still sometimes the border fluctuates each frame and it changes the perceived distance by about half a centimeter or more.

you can…

  • adjust focus. generally best to make sure infinity is sharp, unless you want to put your field somewhere closer
  • reduce aperture, which generally widens the depth of field, but also means less light on the sensor

or, if you have servo focus and can move it to set points, you could calibrate the camera for multiple focus settings, and adjust to try to keep the marker in focus. I wouldn’t do that unless I had to.