you need a good book on the matter. look for “multiple view geometry”, hartley n zisserman.

most of the math is similar triangles, intercept theorem, etc

practical resolution/precision/accuracy entirely depends on *everything* involved in calculating a depth map from a stereo pair of images. more is involved than you might guess.

you seem to be asking about Z-resolution specifically.

- let’s say 1920 pixels and 70 degrees horizontal field of view
- in practice (block matching) you can localize features at sub-pixel resolution but let’s go with full pixel resolution
- 1 meter away
- 63.5 mm baseline for the stereo pair

\newcommand{\mm}{~\mathbb{mm}}
\newcommand{\px}{~\mathbb{px}}
\begin{align*}
\tan \left( \frac{70°}{2} \right) \cdot f_x & = \frac{1920\px}{2} \\
fx &= 1371.02\px
\end{align*}

fundamental constant of your camera (and chosen resolution).

both cameras stare at infinity. their optical axes pierce the wall. those intersection points are 63.5 mm apart. how many pixels apart do those points appear (at a meter away)?

\begin{align*}
\frac{63.5\mm}{1000\mm} \cdot f_x &= x \\
x &= 87.06\px
\end{align*}

that’s in pixels of disparity. in other words, at a meter distance, 63.5 mm are equivalent to 87 pixels.

I’m gonna reuse x a lot. read it as a question mark.

okay, so let’s add/subtract one pixel from that and see where the pixel goes on that wall:

\begin{align*}
\frac{x}{87-1\px} &= \frac{63.5\mm}{87\px} \\
x &= \frac{63.5\mm}{87\px} \cdot (87-1\px) \\
x &= 62.77\mm \\
\\
\frac{x}{87+1\px} &= \frac{63.5\mm}{87\px} \\
x &= 64.23\mm
\end{align*}

and now we figure out where those rays would intersect. a drawing of a bunch of triangles would help here but… eh

\begin{align*}
\frac{62.77\mm}{1000\mm} &= \frac{63.5\mm}{z_1} \\
z_1 &= 1011.62\mm \\
\\
\frac{64.23\mm}{1000\mm} &= \frac{63.5\mm}{z_2} \\
z_2 &= 988.64\mm \\
\end{align*}

so with that setup, for a full-pixel movement at 1 meter, you’d get ± 11-12 mm of depth.

another example: OpenCV’s (CPU) stereo module assumes 4 bits subpixel resolution, i.e. \frac{1}{16}. in that case you can expect to get less than a millimeter of z resolution…

but that doesn’t mean you’ll get it. that’s just the best case.

you can improve the situation by using a wider baseline, or by getting closer to the object, or with a higher resolution camera, or with a narrower field of view (zoom lens?).

maybe this gives a bit of intuition: