Non-planar stereo camera calibration (Two cameras with a physical Z-offset)

I can agree that the in-plane translation-in-x-only configuration is good for many stereo vision applications, but primarily because it simplifies finding correspondences. You get to use block matching because there aren’t scale or rotation differences between views. This is no small thing! But it’s also a constrained setup that doesn’t work in all cases.

For example, if you want higher precision depth measurements, you have some options including narrower FOV and wider baseline. But a wider baseline and/or narrower FOV means a larger “dead zone” where you can’t get depth measurements. A reasonable and workable solution is to increase the baseline and rotate one or both of the cameras so you cover the volume you are interested in. Yes, you probably have to give up block matching, but there is nothing inherently wrong with this setup.

The example you gave of a front facing camera on a car is, I think, an extreme case. The image was transformed to match an overhead view, so a 90 degree rotation between the two views/cameras. A 90 degree rotation might be too much, but what about 45 degrees, 30 degrees etc?

I’m not trying to get into an argument about this, but I wouldn’t want someone to find this thread and think that stereo vision only works in highly constrained configurations.