you do not convert anything.
if you want to be exact about it, the metric units go poof during the projection of 3D points onto the 2D image plane. you get unitless numbers that express the tangent of the angle of a viewing ray.
“the camera matrix” is a product of a bunch of things
- the projection part “copies” the z component of a point into the affine (4th) component, which later is used in a division (“projection”)
- the screen-space translation (cx, cy) and scaling part (fx, fy)