Important Camera Specs For StereoSGBM

I am looking to build a stereo camera rig to generate my own images for processing with StereoSGBM.

What are important specs to look for when choosing what cameras to buy?

I know these algorithms work best with modest resolution so I’ll stick to 1-2Mpx. Is there anything else I should pay attention to?

Any particular camera/lens/sensor recommendations would be hugely appreciated.


before you start engineering, do a prototype using cheap/borrowed webcams. you gotta get a feel for the mess you’re getting into.

if you care to, implement a little 3d scene in which you can play quickly and cheaply. you can create arbitrarily specified virtual cameras to generate synthetic views, and you know what all the results should look like (camera/transformation matrices…)

hardware: start from specifying a desired depth resolution at a desired distance. the depth resolution from a stereo pair isn’t uniform. consider the disparity map. a pixel difference at infinity could be huge, a pixel difference on the tip of your nose is nearly nothing.

that then implies some constraints on baseline (distance between cams), resolution, field of view. these three properties are somewhat “tradeable”.

desired distance also constrains your choice of lens. you’ll want fixed but adjustable focus (if servo focus then manually settable, not auto). cameras with focus fixed to infinity only make sense if your working distance is far. consider depth of field/focus. you may have to choose a narrow aperture. that would then imply strong lighting… or maybe your situation is relaxed in those dimensions.

cheap webcams with infinity focus can be disassembled. their built in lenses have adjustable focus but you’ll have to break some loctite seal on the lens in order to turn it.

make sure the cameras/lenses are mounted so they don’t move at all and they don’t shake/vibrate from minor disturbances. you don’t want to recalibrate the rig every time someone sneezes.

don’t be afraid of high res cameras. you can always scale that down. it also gives you some room for numerical error from undistortion and rectification (resampling operations).

you want a sharp picture but you don’t want software sharpening… or software noise suppression. that just destroys high frequency texture along with high frequency noise. good lighting helps, once again.

you want both cameras to make pictures at the same instant. if you can, get cameras you can link together (one runs, triggers the other synchronously) or whose exposures can be triggered in software.

look at the effect of “rolling shutter”. consider if you need global shutter or not. if you use cams with rolling shutter, make sure to hold your calibration patterns still, or else you’ll get bad results proportional to how much you shook the pattern.

consider the data rate of the video feeds. if USB (2.0 HS with 480 Mbit/s in particular, but also USB 3), make sure to plug them into individual USB controllers, so they don’t have to share bandwidth. if they do have to, that will affect picture quality (one camera won’t get full bandwidth and instead negotiate lower res, lower frame rate, or awful compression).


Hi crackwitz,

Thank you very much for your informative reply (and for being a moderator to this very useful blog).

Leaving extra resolution to account for rectification is an excellent point, I hadn’t considered that.

I should have prefaced my post by saying that I have some experience with SGBM but this has been mainly using ready made datasets (KITTI). I then calibrated my iphone form two fixed positions indoors and processed some stereo images (with mixed success as you can imagine!).

I’m now looking to build on what I have learned by building a stereo rig that can work outdoors (automotive) over long distance, global shutter at >30Hz. There are a few FLIR Blackfly models that could work but there are two considerations in particular I wonder if you might have any advice on:

  1. Most sensors have a choice of either mono or color model. It looks like the trade-off is colour vs QE. Since SGBM only needs grayscale images is it worth opting for a mono sensor or would the SGBM performance improvement be insignificant? I worry this would prevent me from trying out other algorithms that may require colour.
  2. Between GigE and USB3 is one simpler to interface with openCV (or just generally simpler to deal with) than the other?

Any advice you might have any of this would be greatly appreciated.

I can’t say. you understand the issue well enough. block matching needs some high frequency content (“texture”) to work with. more texture can be found if you have color information but the majority of texture content is usually found in the intensity part of an image. OpenCV’s StereoMatcher accepts grayscale input only (say the docs). block matching could work on color data in principle. consider the environment/scene. the better lit it is, the less you have to worry about sensor noise.

hard to say. USB is usually plug and play because USB specifies a Unified Video Class. GigE/Ethernet tends to involve drivers from the vendor. there’s a “GigE Vision Standard”. OpenCV’s VideoCaptureAPIs lists two obvious GigE drivers. you can always just use a vendor driver directly and wrap that data in a cv::Mat for further use.

consider the data rates involved for video vs. what the cable can do (USB 2, GigE, USB 3, 10GigE). uncompressed video means more data rate but less latency. compressed video means less data rate but more latency (for encoding and decoding) and compression bitrate needs to be high enough to make compression artefacts (noise) imperceptible/negligible.