point them all at the same calibration pattern.
you get transformation matrices that map from the pattern’s frame into the camera frame. or you get rvec and tvec, which contains the same info, but in an inconvenient format, but you can calculate a 4x4 matrix from that.
you can invert those matrices. you can multiply them. that’s how you get matrices transforming from any frame into any other.
computer graphics deals with this. there is much written.