Hi,
I have questions about calib3d/src/fundam.cpp
In the function run7Point, which has
// for each root form the fundamental matrix
double lambda = r[k], mu = 1.;
double s = f1[8]*r[k] + f2[8];
// normalize each matrix, so that F(3,3) (~fmatrix[8]) == 1
if( fabs(s) > DBL_EPSILON )
{
mu = 1./s;
lambda *= mu;
fmatrix[8] = 1.;
}
else
fmatrix[8] = 0.;
for( i = 0; i < 8; i++ )
fmatrix[i] = f1[i]*lambda + f2[i]*mu;
// de-normalize
Mat F(3, 3, CV_64F, fmatrix);
F = T2.t() * F * T1;
// make F(3,3) = 1
if(fabs(F.at<double>(8)) > FLT_EPSILON )
F *= 1. / F.at<double>(8);
It has two parts to make F(3,3) = 1, the only difference is the first will set F(3,3) to 0 if it’s large enough.
- Why do we use F(3, 3) to normalize?
- Why for the first part we want to set F(3, 3) to zero if it’s too small?
- Why for the second part we don’t want to set F(3, 3) to zero if it’s too small?
Any reply would be greatly appreciated.
Best,
John
hard to explain properly. I’ll try with analogy.
say you’re in a plane (2D) and you have a point (x,y). in a projective space (“homogeneous coordinates”), which is 3D then, that plane is put at z=1 and the 3D point (x,y,1) canonically represents that 2D point… but so do all other points (x,y,1) \cdot w for any w \neq 0. those form a line. the whole line represents that 2D point, i.e. every point on the line does. the line goes through the origin, and of course through the point in the plane. you can scale these vectors/points around without changing what they represent. if you have a vector (a,b,c) in that space, and you want to know what 2D point it represents, you divide by the last element, and you get (a/c, b/c, 1), which is on the plane of your 2D space.
there’s an added degree of freedom that “doesn’t matter”.
homographies are meant to have 8 degrees of freedom (that’s how much they need to represent a perspective transformation). the matrix has 9 elements, so there’s one coefficient too many. it doesn’t carry any information though. it’s always supposed to be 1. and that is why you divide the entire matrix by that bottom right element. you need not but for numerical reasons it’s a good idea. two homography matrices, if they’re multiples of each other, represent the same homography.
no, when it’s too small. when it’s larger than some epsilon, it’s good, otherwise it gets zeroed out.
I have no idea what they do in your code snippet or why. likely they’re checking if some numerical property indicates that the solution is “broken” (the bottom right element is close to 0), and they make sure to carry that information through while avoiding a division by (near) zero.
Thanks, it’s my typo, it should be “if it’s NOT large enough.”
For example, why don’t we seek the largest element (in the absolute value sense) and set it to 1.f? Why do we always choose F(3, 3) of the fundamental matrix?
that’s the point where you should look for a math professor because I can’t explain that in-depth.
all kinds of transformation matrices (homographies, camera matrices, affine ones) in projective spaces maintain the 1 in the last element of every vector (division), and the 1 in the bottom right (last dimension of input, last dimension of output).
you can certainly choose to scale/normalize those matrices however you like but when you scale so the bottom right element is 1… that makes the matrix easy to read, easy to eyeball. you have rotation and scaling in the top left, translation on the top right, some perspective coefficients on the bottom left (that I don’t have an intuition for beyond “division”), and that 1.