There is a lot of ground to cover on this. I’ll see what I can do.
First of all, a disclaimer: I don’t have a lot of experience working with the OpenCV stereo calibration algorithms, so take everything I say with a grain of salt.
I’m going to keep the first pass brief, because I don’t know what you do/don’t know.
Intrinsics: The intrinsic parameters describe physical properties of a given camera + lens pairing and include the focal length, and optical image center. These parameters describe how 3D points in the camera reference frame project to the image sensor.
Note that the focal length units are pixels, as is the image center. Intrinsics are assumed to not change, which means that if you have an adjustable zoom or focus you must lock it down (preferably physically) so that it doesn’t change after you calibrate the intrinsics. The camera intrinsics are often represented in matrix form, and is commonly called the camera matrix. The camera matrix is typically written as a C or K.
Extrinsics: The extrinsic parameters include 3 rotation angles and 3 translation values. These parameters describe the position and orientation of a camera in some 3D reference frame. Note that origin of the camera reference frame is the nodal point, which corresponds to the pinhole of the theoretical pinhole camera.
While it is possible to estimate the values for both intrinsic and extrinsic parameters (either from measurement or provided specifications), calibration will almost always give better results.
If I were doing this I’d approach it something like this:
- Calibrate intrinsics for each camera separately.
- Calibrate the stereo configuration using the intrinsics you calibrated (note that the flags parameter defaults to FIX_INTRINSIC in the stereoCalibrate() call - this causes it to use the intrinsics you pass in and will not re-compute them.)
- Do whatever validation you need to do so that you can trust the R and T results you get.
Note that the R and T parameters describe a transformation from the camera 1 coordinate frame to the camera 2 coordinate frame. So you might be looking for your translation vector [0,16,0] in the T, but you won’t find it written that way. Assuming your translation measurement is accurate, you would expect the length of the translation vector to be the same as the length of your vector (16), but it will transformed depending on what R is, so you will get something different (not just a translation along a single axis.) And what you actually get depends on how you chose your coordinate system. (does your X,Y,Z agree with OpenCV? Is your rotation around the correct axis?)
I prefer doing the camera calibration offline / as a separate step for a few reasons. It’s a more complex problem to solve, so doing it separately lets you focus on getting high quality results without being constrained as you would by doing it in-situ. Also it probably makes the stereo (extrinsics) part easier because you don’t require a large number of input images with different calibration target position/orientations. (which you would need to get good intrinsics)
This is all assuming you want to use OpenCV as the framework. If so, it’s best to buy in to the framework and let OpenCV do the work for you. Trying to estimate / measure things yourself is fraught - if for no other reason because you have to make sure you are representing the reuslts in the same way that OpenCV would. (And really the calibration algorithms are going to give you far more accurate results, I promise.)