What Constitutes a Good Calibration and Acceptable Variance?

I’m currently studying calibration.
When I calibrate with different image sets, the calibration results slightly change, which I understand is natural.
My question is, is there a standard for what constitutes a good calibration?
When I obtain several calibration results, is there an acceptable range for the variance when obtaining multiple calibration results?
How can I determine if the calibration is good or not?
I know that, typically, the quality of a calibration is assessed using reprojection error, but all of my results have had good reprojection errors.

you’ve learned of repro error. that’s as “standard” as it gets, without being an actual standard.

good/low repro error, on its own, is necessary but not sufficient. repro error can only capture the quality of the calibration for that part of the view in which you have captured some points. if your captured points don’t reach into the far corners of the view, then repro error cannot say anything about the quality of the calibration in those corners. but there the calibration is most needed because there the lens distortion is greatest. repro error only becomes useful when combined with the view being covered well with points.

calibration is a curve fit. the curve fit is only valid for the samples you have, and reasonably good between the samples (interpolation), but go outside of those (extrapolate) and the results become increasingly arbitrary and wild.

also make sure the camera has global shutter, or the calibration target and camera were not moving when each picture was taken.

also Calibration Best Practices – calib.io, a good practical summary

1 Like

There are a lot of factors involved, so it’s not really possible to give a general answer. The quality of your calibration target, your lens, the lighting, etc. all play a role in your results. If your calibration target isn’t flat and rigid, you can expect varying results from trial to trial just because the target is changing. If you aren’t getting similar coverage from one trial to the next, you can’t meaningfully compare the scores between the two trials.

For example:

Trial 1: calibration points cover the central portion of the image, but don’t have points near the edge or corners. Calibration score is 0.1

Trial 2: calibration points cover the entire image, including the edges and corners. Calibration score is 0.2

You might be tempted to conclude that trial 1 is better because the error score is lower, but if you need to process points outside of the central area, you will find that the calibration results from trial 1 aren’t very accurate near the corners / edges. Trial 2 probably represents the better result in spite of the error score being twice as high.

Similarly, if you have one trial that only uses 5 input images, and another that use 15, the second trial might produce a higher error score, but actually be a better result. Why? Without enough variation in your input images (angle / depth change with respect to the camera, etc.) you can end up with a calibration model that fits the data well, but doesn’t fully capture the physical reality of the camera. This often shows up as inconsistent focal length estimates - a sure sign that you need more and/or better quality input data.

When I’m trying to validate a calibration process, I will calibrate a camera a number of times (say 20), and then project a set of synthetic 3D points using the 20 calibration results*. I look for patterns in the data. For each group of points, are there outliers that consistently come from one or more of the trials? If so, what was different about the affected trials, and what can I do to reduce those errors. If there aren’t obvious outliers, I look at the spread of values for the various points and ask myself if it seems reasonable for the camera, lens, etc. As much as I like to make it as accurate as I can, there is no such thing as perfect. So at some point it comes down to whether or not the results are consistently good enough to achieve whatever it is you are trying to achieve.

*I actually will back-project a collection of image points to a world plane and do the comparisons in 3D so the errors are in units that are meaningful to me. I’m typically looking for spreads of about 0.1mm, but that can vary greatly depending on application.

As for evaluating a given calibration run, I augment the input images with the locations of detected points, and draw a grid of lines on the undistorted images. You can do a quick visual assessment of the calibration and often see errors (invalid / inaccurate point detection) that are contributing error to the results.

For example, here is one image (of 15 total) and the augmentations I make through the process.

Input:

Corners detected after multiple iterations and filtering. Red=kept green=predicted location, but not used for calibration

Undistorted image with grid drawn. I look at how tight the intersection of the lines are, and how well the lines coincide with the chessboard corners. Errors with distortion modeling show up as curved chessboard lines referenced to the straight ideal (green) lines.

3 Likes

Thank you for your response.
It helped me a lot in setting the direction of my work.
I have more question.
You said you would re-project to world plane to verify calibration performance.
At this time, did you fix the board in one place and move only the camera?
Is the error at this time using the spacing of the board patterns?
And, if we want to proceed calibration for other domain images (ex) rgb-thermal), would it be meaningful to re-project to world plane for each other domain image?
I’m sorry for asking so many questions.
If the answer is too much, you can recommend a book to study calibration instead of answering the above question.
I’ve been suffering too much from calibration.
I want to study properly, but I couldn’t find any information other than blogs, opencv document.

Very intresting approach!
Any repo you might have to replicate this example you just shown us? I am suffering a bit too to trust my calibration result so this could be huge for me to have.
Also, couldn’t this process be automated by calculating the stadard deviation of the intersection between the green lines? I am really not an expert, just a hunch.
Thanks in advance Steve.