What is the arbitrary scaling factor s in camera calibration?

If you want to understand this, you need to let go of some misconceptions, among them the math you already stated, and a bunch of your claims. And you need to accept that the confusion isn’t an issue with the math that’s in the docs, but primarily with the math you invent, and secondarily how you interpret the docs.

All the math in the docs, that you’ve quoted so far, is fine. Start from that. I’m hedging my bets, because the docs aren’t infallible. These parts of the docs are fine though.

The naming you come up with adds to the confusion. Camera matrices are called K, not A. and if you want to involve sensor size and resolution, I’d recommend doing that purely to calculate the focal length (as pixels).

focal length is two things: either a physical quantity (length), or a number of pixels.

f [pixels] = (focal length [mm]) / (pixel pitch [µm / pixel])

The optical axis is the center. The view is a triangle that is similar to the triangle spanned between the lens origin and the sensor placed at the focal distance (physical distances). as such, if you want a point on the edge of the view, you have to place it a focal distance away (ok so far), but HALF the sensor width off the axis, not an entire sensor width away.

That equation doesn’t even have the right shapes. Does not compute. the matrix is applied backwards (v M), needs to be applied the right way (M v). And I would recommend putting inputs on the RHS and the result on the LHS.

\begin{bmatrix}p_x \\ p_y\end{bmatrix} = K \cdot \begin{bmatrix}s_x \\ s_y \\ f\end{bmatrix}

Still needs projection, and a fix on the LHS for the shape.

s \cdot \begin{bmatrix}p_x \\ p_y \\ 1\end{bmatrix} = K \cdot \begin{bmatrix}s_x \\ s_y \\ f\end{bmatrix}

Just try to calculate the RHS. you’ll get some vector that’s not on z=1. divide by z. then it’s got z=1. that’s it. that’s what the s is there for.

Certainly the camera matrix isn’t being normalized by anything. It’s a constant, unaffected by what the camera sees. there can be no s and no z affecting it.

You’re linking to some code, the head of a loop, that contains some additions and multiplications, but no division (normalization). the loop following it goes over a 4-element thing and divides something, but I don’t immediately see the relation between that and this discussion.

That is true. that has a little to do with the s up there, but only insofar as it’s just another scale factor in that equation, which you would put between K and the [Xc, Yc, Zc] point, to signify that it applies to the geometry. mathematically, all the scale factors can be combined, but that’s obfuscation. The parts of the equation have individual meaning. The s is purely there as something expressing the projection to the z=1 plane, i.e. a division. Multiplying by a projection matrix (the camera matrix) is just the first part of a projection. The second is that division, which brings all the points onto the projection plane.

that part of the quote is fine, mostly. nothing is factored out because that thing you think exists, doesn’t. the distance information simply doesn’t exist in a picture.

that part isn’t fine.

that quantity does not exist. it cannot be obtained for math reasons. that information doesn’t exist in reality, and not in theory.

maybe what you want is pose estimation, of the calibration board in each calibration view. that requires a model of the object. if you give pose estimation a model that’s twice as big, it’ll put the object at twice the distance, given the same picture. that is “similar triangles”.

1 Like