At first I am new to the topic, so don’t take the things I am telling you for undisputable.
As far as I know you dont need to know your camera posture. You just need to (excactly) detect the “points of interest” on the paper and map them to the “destination” points.
In my case I used the squares in the corners:
e.g. getting pixel coordinates of top right corner: (2500, 100)
And knowing this maps to a “real world” point of a DINA4 paper ~(0 cm, 210 cm, 0 cm).
Edit:
Use (0 cm, 210 cm) as point for the “real world”.
Since a homograpy is only defined from 2D to 2D you dont need the last coordiante. I guess setting it to 0 is even false, because in the definition of homogeneous coordiantes this would map to a point at infinity. And you may scale it to pixels in order to get a meaningful image. e.g. instead of 210 cm 2100 mm/pixels. Now every pixel in your final image should be one mm?
(You can arbitrary place your coordiante frame in the “real world”!)
You need at least 4 correspondences like this.
Then you can use cv2.findHomography(image_points, world_points)
If you provide more than 4 points this increases the accuray of your homography.
But you may also use arbitary points. I used this quadrilateral.
e.g. Selecting the 4 points in pixel coordinates and “knowing” they map to the coordinates I printed on the sheet (in cm).
Which language are you using. I could offer sharing my python script.
P.S.:
If you know your camera pose (rotation and translation) and your intrinsics you may calculate your projection matrix from 3d to 2d.
But the other way around isn’t possible. You cant get a 3d model out of a 2d image. (You can in some cases, but thats more complicated, see Hartley & Zisserman page 230. Chapter 8.9)
So you "loose “information” about the depth. So having a photo of a plane you need to provide additional information about the distance.I guess incooperating this distance in your projection matrix reduces it from a not invertible 3x4 to a invertable 3x3 matrix. Camera projection is a “one way ticket”. Invertability means, you are able to get your plane back without projective distortions. (So image → real world points x, y, z, with fixed z coordinate, which is therefore a plane in 3D.) (A homgography doesn’t inculde lens distortion!).
But that only holds for ONE plane, see:
The cube faces on the side are still distorted. (Non orthogonal.)