Non-planar stereo camera calibration (Two cameras with a physical Z-offset)

Hi all,

I’m planning on creating a stereo camera that will have two camera modules on flexes. They will have different Z-offsets from each other, but will otherwise be pointed in the same direction (co-planar).

Will I have trouble with extrinsic calibration / disparity doing this? They will have about 2cm difference in Z-offset.

I can’t predict that, don’t have experience with such setups.

imagine the epipolar lines. calibration would probably have to severely contort the views.

the involved implementations make some assumptions. I am not sure what you are doing falls into those assumptions.

you’d best grab all the books on this that you can, and do literature research for papers.

Thanks.

Hmm…Wondering if there are assumptions in stereoRectify that make this doable.

I’m getting an issue where the calibration seems to pretty severely crop the usable area.

block matching requires epipolar lines to be parallel and horizontal.

imagine the line connecting both camera origins. all the epilines will be parallel to that line, or rather, meet at infinity in the vanishing point of that connecting line.

for cameras that sit beside each other, so no Z difference, that line will be crossing at right angles in front of the cameras, have no vanishing point, so the epilines will be parallel in image space too.

if the cameras have Z difference (but otherwise face the same direction), that connecting line will have a vanishing point in each view. it might be off-screen or it might be in view. in any case, that is where the epilines in both views will meet. they are not parallel in image space. that is not what block matching requires.

stereoRectify’s task is to warp a view such that those lines become parallel. that function does NOT expect such intentional shenanigans. it expects nearly parallel epilines. as I said above, warping those views is going to require serious contortions and likely overwhelms the function’s abilities. the implicit assumption is that you aren’t doing any weird stuff. what you are doing is weird stuff, not a regular stereo rig. imagine warping those views. “virtual” cameras would have to be constructed that are facing at right angles to the connecting line, i.e. the camera matrices are calculated to be panning sideways. and then the view pyramid/frustum is going to (have to) get skewed to keep the actual image content within it. that means the new views won’t have their principal points in the middle of the image content, but somewhere off to the side.

so, yes, books and such. do not expect the library to be helpful here. the functions, even for what they’re intended for, are severely under-documented and contain magic that is hard to predict. one piece of magic is that cursed “alpha” parameter adjusting the crop. more magic happens in all the numerical optimizations that throw matrices at you that are close to inexplicable. at least that was my experience last time, when I was new to all of this, hadn’t had any classes on the topic, and thought it can’t be that hard, the library will guide me.

1 Like

a sketch of the situation

not as illustrative as it could be. some angles happen to be equal, causing the views’ left edges to coincide with the optical axes, because I didn’t plan the pen strokes.

Thank you. It sounds like this is very much unexplored territory… “here be dragons”.

And yeah… the alpha parameter is one of the many things that is very confusing. Different values gives me results that I can’t really explain. 0.8 vs 0.9 vs 1.0…

It sounds like I’m setting myself up for pain trying to do this.

I’m playing around setting stuff up in Blender to generate synthetic data, but so far not having tremendous amounts of luck getting it to behave.

I can’t speak to the capabilities of stereoRectify() or other openCV functions, but I don’t think there is anything fundamentally wrong with your setup. I’ve always thought of the “two cameras in the same plane” as a special case / simplification of the general case, and I would expect it to work for a very wide range of camera configurations, including Z differences, significant rotation between the cameras (non-parallel optical axes), etc.

In the discussions I’m familiar with, the epipoles (where the camera center of one camera projects to in the other camera’s image) is usually within the bounds of the image (probably just for the sake of clarity / ease, as there is no such requirement). (See images from “Multiple View Geometry” Hartley/Zisserman below).

Later in the same book (section 11.12, Image Rectification), they go on to say:

“This section gives a method for image rectification, the process of resampling pairs of stereo images taken from widely differing viewpoints in order to produce a pair of “matched epipolar projections”. These are projections in which the epipolar lines run parallel with the x-axis and match up between views, and consequently disparities between the images are in the x-direction only, i.e. there is no y disparity.”

The points here are:

  1. Works for widely differing viewpoints
  2. The purpose of rectifying the images is to simplify the search for matching features. You can also search for matching features by searching along epipolar lines in the original (non-rectified) images, but it’s just easier to search along horizontal lines in the image.

Again, I can’t speak to the implementation of the openCV stereoRectify() algorithm, but I have always assumed it would work just fine in the general case.

I would recommend picking up a copy of “Multiple View Geometry” by Hartley/Zisserman


I’m gonna go off on what seems like a tangent. my perception of the situation probably errs on the side of thinking it worse than it is. there are however clear limits to the math and the practice.

you stated Z difference of “2 cm”, but how much baseline/IPD?

yes, that is for illustration purposes, “not to scale” I’d say.

mathematically, those points are always “in view” because the field of view has no reason not to approach 180 degrees.

in the case of the general multi-view situation (SFM etc), it’s fine to have those points be practically in view, and even has advantages. optical axes crossing at right angles would give the math the best possible condition.

for block matching, which is a stereo situation, it’s actually misleading, a bad situation to have. by that I primarily mean Z differences (“side-eyed”) but also severely “cross-eyed” setups. these only differ in which eye goes “cross” in what direction. the more “cross”, the worse.

the epipole in view… that means you could see one of your eyes with your other eye (remove nose to demonstrate), that severely cross-eyed. imagine trying to rectify such a view. imagine the homography. you have that vanishing point in view, and now you’re supposed to put that off to the side (“maps the epipole to a point at infinity”), produce a top-down view (parallel epilines).

imagine a front-facing camera in a car, on a straight road. the vanishing point of the road is in view. now try homographing that to a top-down view.

the situations are equivalent.

stealing pics from elsewhere:

so YES, if the goal is block matching for stereo vision, then avoid any such situations.

supposing the starting situation is not as severe, then you can work with that. the newly calculated views (camera matrices) still have some properties to be aware of (principal point relative to image bounds, …). it’s not like you still look at the target straight on. if those were your eyes, your fovea/foveae would be out of work very quickly. you’d look at the target peripherally.

slight Z differences, as in Fig. 9.3 in Steve’s post, can be corrected, but that is still a materially (if slightly) worse starting situation than no Z difference. it’s a correction of a suboptimal situation, not simply a normalization/transform of a fine situation.

I’m thumbing through the book for any discussion of block matching for stereo vision. there is a little bit written on the geometry of it in chapter 11.12. not enough to help you practically or convey any intuition. that is an exercise to the reader.

yet the example images in the book don’t show severe examples. they show some in-plane rotation, a bit of cross-eyeing, but nothing where the epipole comes near being in view.

that’s what I take issue with. it’s a math book only, not helping with practical aspects. the book makes those claims without actually demonstrating them. do not just buy into that. printed word does not win arguments by virtue of having been printed. the arguments have to convince. I believe I demonstrated what is even in the realm of entertainable positions with respect to the situation.

part from the book:

Since the application of arbitrary 2D projective transformations
may distort the image substantially, the method for finding the pair of transformations
subjects the images to a minimal distortion.

that is waffling. non sequitur. prime example of my disdain for “academic” writing. the antithesis to technical writing. that phrase alone should shake anyone reading the paragraph into critical reception. the transform is determined (in non-degenerate cases), i.e. the perspective part of it. there is no “minimal distortion” solution to choose from any solution space. any degrees of freedom left are non-consequential to quality: zoom/scale/stretch, translation and rotation. at most, one should caution to pick those degrees of freedom pragmatically (i.e. not scaling down to thumbnail size or squashing comically).