Understanding the demo on homography from camera displacement

Hi, I’m going through the demo 3 here on how to compute homography from camera displacement i…e, two different camera poses. I checked the derivation of the given equation to compute homography
Screenshot 2021-09-03 at 11.33.59
on the linked Wikipedia page and saw the following sentence:
“This formula is only valid if camera b has no rotation and no translation.”

Now I don’t understand what this means exactly. Why would I need to compute a homography if the camera had no rotation and translation.
Could somebody please explain under what circumstances the second equation given in Wikipedia (below) should be used, instead of the one in the tutorial?

Screenshot 2021-09-03 at 11.38.12

wikipedia is one of the worst sources for understanding a math topic. they’re nothing but a cheat sheet for those who already understand.

those equations are needlessly complicated. they rip apart rotation and translation of a transformation in space. I think that should never be done, and if, the math must be impossible to express otherwise.

also, this kind of math must be explained with figures (formulae aren’t figures).

I’ve skimmed the OpenCV tutorial you linked. the written math I did spot-check mostly looks reasonably understandable. there’s one spot that I find awful, the last… “equant”/term(?) is just insane:

look at the 1st, 2nd, 3rd terms, those give you meaning. ^{c_2}M_{c_1} and how to flatten it into 3x3 (must be explained further down), that is important to understand (if you need a homography from 3D transformations).

ignore the needless explosion of matrices into insane operations nobody should ever do (such as this -^{c_1}R_o^T \cdot t_o fuckery there).

you should find better resources than wikipedia. try the Szeliski book. he posts drafts of his complete book: Computer Vision: Algorithms and Applications, 2nd ed.

the other two book I know (yes I know three books) are hartley & zisserman, forsyth & ponce.

Thanks for the hint on the books. I did check Szeliski, but had still some doubts, so I will check one of the others too.

I don’t really understand your comment on the last equation in the OpenCV tutorial. The tutorial tells that the homography matrix H can be built by taking the terms at positions (0,0) and (0,1) of the matrix from the last equation (which is also given in the code). Do you find the presentation awful or is there something incorrect there?

I don’t understand to what you refer.

Sorry, I refer to ^2R_1 and ^2t_1. In code:

R_1to2 = R2 * R1.t();
tvec_1to2 = R2 * (-R1.t()*tvec1) + tvec2;

These are then used to build to compute the homography:

H= {^2R_1}-\frac{^2t_1\cdot n^T}{d}

where n^T is the normal vector of the plane d is the distance between the camera 1 and plane.

yes, may be mathematically correct but it’s insane.

what they have there is matrix multiplication and inversion.

this is what they calculate:

^{c_2}M_{c_1} = ~ ^{c_2}M_o \cdot ~ ^{o}M_{c_1} = ~ ^{c_2}M_o \cdot (^{c_1}M_o)^{-1}

^{c_2} R_{c_1} and ^{c_2} t_{c_1} exactly form ~^{c_2}M_{c_1}.

that calculation involves the inverse of a matrix. calculating the inverse costs a little more than a transposition (which is possible in special cases) and it may be numerically unstable in the general case, but not here, because here we have orthonormal matrices, which are very tame.

so to apply that “trick” they take the transformation matrices apart:

M = \begin{pmatrix} R & t \\ 0_{1 \times 3} & 1 \end{pmatrix}

that’s a “block matrix”, a big matrix composed of blocks, which are smaller matrices (vectors, scalars).

and instead of saying M^{-1}, which is a “simple” matrix inversion in math and in code, they take that apart and calculate with the parts. how that’s done in general can be seen on wikipedia. if you apply those rules to this specific case:

(if R is orthonormal, meaning its column vectors are normal to each other, and they have unit length (i.e. it’s exactly a rotation, no scaling, no shearing), this simplification holds: R^{-1} = R^T )

M^{-1} = \begin{pmatrix} R & t \\ 0_{1 \times 3} & 1 \end{pmatrix}^{-1} = \begin{pmatrix} R^T & -R^T \cdot t \\ 0_{1 \times 3} & 1 \end{pmatrix}

doing that may be okay in code (because code means obscured meaning), if properly documented, but in explaining and understanding the math, it’s absolute cancer.

you’ll understand the math a lot easier if you realize it only involves multiplying transformation matrices, and sometimes inverting them to transform in the other “direction”.

skip over all the insane stuff where they juggle with individual R and t.

1 Like

you can follow szeliski’s book Computer Vision: Algorithms and Applications, 2nd ed., P61,P62, calc M10 first, then set H10 = M10[0:3,0:3], and normalize H10 = H10/H10[2,2], by this you can get the same homography as opencv.