Inconsistent camera calibration results


I am trying to calibrate a StereoLabs Zed 2 camera and I get inconsistent results.

Zed cameras are calibrated themselves, and here is the calibration parameters coming from their SDK:

I am trying to calibrate the camera myself (because I want to do a robot-camera calibration), and my problem is that I cannot get results similar to Zed SDK calibration result.
I have used 3 different checkerboards with 3 different sizes (link to the images of the checkerboards and my setup: IMG_8871.jpg - Google Drive)
Here are the links for the images I took for calibration:
For smaller pattern: calibration small pattern - Google Drive
For medium pattern:
calibration medium pattern - Google Drive
For the larger pattern:
calibration large pattern - Google Drive

The problem is that I get different intrinsic matrices for each, and the none of them are similar to Zed SDK result.
Smallest pattern:
array([[1.18941333e+03, 0.00000000e+00, 1.11280623e+03], [0.00000000e+00, 1.23577099e+03, 6.53167171e+02], [0.00000000e+00, 0.00000000e+00, 1.00000000e+00]])

medium pattern:
array([[1.07054440e+03, 0.00000000e+00, 1.11565267e+03], [0.00000000e+00, 1.07011710e+03, 6.28861026e+02], [0.00000000e+00, 0.00000000e+00, 1.00000000e+00]])
large pattern:
array([[1.01906390e+03, 0.00000000e+00, 1.12143074e+03], [0.00000000e+00, 1.01388780e+03, 5.92256064e+02], [0.00000000e+00, 0.00000000e+00, 1.00000000e+00]])

In all cases, reprojection error (the first output of cv2.calibrateCamera())is less than 0.6, and the output of the following code is also always less than 0.08.

ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(objpoints, imgpoints, gray.shape[::-1], None, None)

mean_error = 0
for i in range(len(objpoints)):
 imgpoints2, _ = cv2.projectPoints(objpoints[i], rvecs[i], tvecs[i], mtx, dist)
 error = cv2.norm(imgpoints[i], imgpoints2, cv2.NORM_L2)/len(imgpoints2)
 mean_error += error
print( "total error: {}".format(mean_error/len(objpoints)) )

So I think each function is doing its job very well, but overall the final result is inaccurate.

So here are my questions:

  1. These matrices are actually very different, and I expect that if I do a good job in calibration, I should have gotten consistent result. Yes?

  2. Even though the errors are very low, still the calibration is far from perfect, yes? And this means that for the images provided, the calibration algorithm is doing a good job, but the images are not enough to do the calibration robust enough? (Note: The robot has only 4 DOF, and it can only rotate along Z axis of the robot, that is why the images I can collect using the robot is limited, and that is why I have added several images without using the robot, and with very different angles)

  3. What should I do to have the calibration as robust as possible?

Note: in this post, I have discussed the big picture of what I want to do, which is robot-camera calibration. Eye-to-hand calibration for a 4 DOF robot arm

Thank you for your reply.

I realized that for the larger pattern, only 2 of my non-robot images are detected by cv2.findChessboardCorners(), so I am using medium pattern for the rest of this comment.
I read this document before posting my question. Here are my notes to each point:

Also, I have updated the link with visualization of detected corners: calibration medium pattern - Google Drive

So, I have done the calibration 3 times, one time with only 9 images using the robot, another one with only 6 images not using the robot, and the other one with all 15 images combined: here are the results:


import cv2
import numpy as np
import glob

# termination criteria
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 30, 0.001)
# Parameters
pattern_size = (6, 9) # Number of inner corners per a chessboard row and column (6x9 for example)
square_size = 0.0835/7  # Square size in meters

# Prepare object points (like (0,0,0), (1,0,0), (2,0,0) ....,(6,8,0))
objp = np.zeros((pattern_size[0]*pattern_size[1], 3), np.float32)
objp[:,:2] = np.mgrid[0:pattern_size[0], 0:pattern_size[1]].T.reshape(-1,2)
objp *= square_size

# Arrays to store object points and image points
objpoints = [] # 3d point in real world space
imgpoints = [] # 2d points in image plane.

# Load your images and robot poses here
images = glob.glob('mid_pattern/all/calibration_image_*.png')
R_gripper2base = rotation_matrices # Robot base to end-effector rotation matrices
t_gripper2base = translation_vectors # Robot base to end-effector translation vectors
print('total number of images: ', len(images))
for fname in images:
    img = cv2.imread(fname)
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

    # Find the chess board corners
    ret, corners = cv2.findChessboardCorners(gray, pattern_size, None)

    # If found, add object points, image points (after refining them)
    if ret:

        corners2 = cv2.cornerSubPix(gray, corners, (11,11), (-1,-1), criteria)

        # Draw and display the corners
        cv2.drawChessboardCorners(img, pattern_size, corners2, ret)
        cv2.imshow('img', img)
        cv2.imwrite('{}.png'.format(ii), img)

print(ii , ' number of used images')
# Calibrate camera using the object and image points
rms_error, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(objpoints, imgpoints, gray.shape[::-1], None, None)

mean_error = 0
for i in range(len(objpoints)):
 imgpoints2, _ = cv2.projectPoints(objpoints[i], rvecs[i], tvecs[i], mtx, dist)
 error = cv2.norm(imgpoints[i], imgpoints2, cv2.NORM_L2)/len(imgpoints2)
 mean_error += error
print( "total error: {}".format(mean_error/len(objpoints)) )

case 1: only 9 robot images
rms_error: 0.2344
mean error: 0.030167432
intrinsic matrix: array([[1.16197151e+03, 0.00000000e+00, 1.14025087e+03],
[0.00000000e+00, 1.20612005e+03, 6.76111628e+02],
[0.00000000e+00, 0.00000000e+00, 1.00000000e+00]])
case 2: only 6 non-robot images:
rms_error: 0.7238
mean error: 0.0945285869945198
only using 6 non-robot images
intrinsic matrix: array([[1.07204959e+03, 0.00000000e+00, 1.09128501e+03],
[0.00000000e+00, 1.07310275e+03, 6.24612207e+02],
[0.00000000e+00, 0.00000000e+00, 1.00000000e+00]])
case 3: all 15 images:
rms error: 0.5081
mean error: 0.0585292
intrinsic matrix: array([[1.07054446e+03, 0.00000000e+00, 1.11565251e+03],
[0.00000000e+00, 1.07011719e+03, 6.28861017e+02],
[0.00000000e+00, 0.00000000e+00, 1.00000000e+00]])

These are my points for each of the 12 points in the article you shared:
1, 2, 3: The images I took with the medium calibration pattern, where each square is 11.92 mm seems to fit the criteria here. (Note: I have used 11.90-11.93 mm as square size in the code, and the result of cv2.calibrateCamera() is almost equal in these cases, so the precision of square size is not the issue.

4: the images on robot does not have a lot of variation, and it is going to cause problems for me even after I resolve this issue (details are here: Eye-to-hand calibration for a 4 DOF robot arm) That’s why I have added those 6 extra images with very different orientations.
5, 6: I think the images are good here as well, and having low reprojection error confirms that as well.
9: camera is fixed and not moving. The robot is also robust and precise at performing movements. For those images that are not connected on the robot, I did my best on my hand not moving, and images visually look good to me.
10, 12: the only thing I have not done, is per feature error analysis. With very low reprojection error, I don’t think the problem is here.

11: I was aware of this, that is why I asked the 2nd question at the end of my original post.

My questions are still:

  1. These are still far from original Zed camera calibrations params. What should I do to make it closer to it? Even though it seems like I am doing all the steps correctly. Images have variations, visualization is good, reprojection error is low, etc.

  2. Assuming I get perfect results here, what should I do for 4DOF robot camera calibration, knowing that I cannot have very different orientations because the robot only rotates around its Z axis.
    This is the link to this question: Eye-to-hand calibration for a 4 DOF robot arm

Update: after adding 6 more images with different orientation, I got closer to Zed parameters: (fx=fy=1041.515625, cx=1121.7393, cy=625.8975)

Now I have these (using all 21 images, 9 with robot and 12 without robot)
rms error: 0.6044
mean error: 0.06943075475785
intrinsic matrix: array([[1.04842952e+03, 0.00000000e+00, 1.14485657e+03],
[0.00000000e+00, 1.05021504e+03, 6.58337195e+02],
[0.00000000e+00, 0.00000000e+00, 1.00000000e+00]])

Eventhough the error values have increased, the parameters are closer to original Zed values. I am at peace with this observation, because I have added more diverse views and taht has contributed to overall error. Please correct me if I am missing something.

I still have the 2 questions asked in the previous comment. I will continue capturing more viewpoints and see how it goes. (Update: adding more images did not help, here is the result with 31 total images:
rms error: 0.5812
mean error: 0.068767015
intrinsic matrix: array([[1.05438837e+03, 0.00000000e+00, 1.15143517e+03],
[0.00000000e+00, 1.05770768e+03, 6.71093224e+02],
[0.00000000e+00, 0.00000000e+00, 1.00000000e+00]])

I am very stuck on what to do, any help is greatly appreciated.

That document is pretty comprehensive and quite good. Have you read over it and made changes as necessary?

The things that stick out to me:

  1. your calibration target isn’t very good. Look at calibration image 31 from the medium target data set. It’s clearly not adhered well to the substrate, and the substrate appears to be corrugated cardboard. I would suggest using glass as the substrate and a spray adhesive to get the printed target to adhere everywhere to the glass. I’d also want to convince myself that the target is being printed accurately.

  2. I would consider using a denser version of the large target. I like to get more samples and also fill a larger portion of the image with each input image.

  3. Us a Charuco board so you don’t need to see the full pattern - this makes it easier to get samples close to the corners.

  4. You appera to not be calibrating the distortion. I see that the provided calibration does not include distortion parameters either. If you are relying on the assumption that the lens has negligible distortion I would prove that to yourself / do some tests to get a sense of how much distortion there actually is. (Take a picture of something that is known to be straight, with it positioned so it runs along the top or bottom of the image. Inspect how straight the line is with Gimp or other image editing application)

  5. You appear to not be using CV_CALIB_FIX_ASPECT_RATIO as a parameter. This lets the optimizer use different values for FX and FY - with modern sensors the pixels are very close to square, so FX and FY should be the same. If you use the FIX_ASPECT_RATIO parameter, FX and FY are locked, which is most likely what you want. (Note that the calibration scores will likely get worse, but that’s better than thinking you have good calibration when you don’t)

the biggest issue I see with those pictures is that the board spans a tiny area of the view.

charuco needs to die. it’s better than a plain checkerboard, but only because existing board detection methods suck. charuco is a technically complicated solution and the big downside is that it limits the number of corner points (because the arucos have to remain detectable) while the whole idea of a calibration is to get lots of them.

one of these days, I’ll implement that algorithm that recovers the grid graph from a set of corner points.

Thank you for your reply!

  1. You are absolutely right. I ran 3 more experiments,
    1.a In the first one, I tried my absolute best to adhere the printed paper to the cardboard. Link to the images: calibration_medium_pattern_v2 - Google Drive
    RMS_error: 0.9885664083711292
    Calibration matrix: [[1.01752791e+03 0.00000000e+00 1.12353273e+03]
    [0.00000000e+00 1.01842619e+03 6.01850249e+02]
    [0.00000000e+00 0.00000000e+00 1.00000000e+00]]

1.b This time, I used glass instead of a cardboard. And by far I got the best results (comparing to calibration data from camera manufacturor). So it seems like the cardboard not being absolutely flat mattered!
Link to the images: calibration_medium_pattern_v3_glass - Google Drive
RMS_error: 0.17787248595003988
Matrix: [[1.04042749e+03 0.00000000e+00 1.11960696e+03]
[0.00000000e+00 1.04178099e+03 6.25928528e+02]
[0.00000000e+00 0.00000000e+00 1.00000000e+00]]

1.c In this experiemnt, I used exactly the same pattern from the glass experiment (same square size, so I ruled out square size difference) and did not use the robot at all.
link to the images: calibration_medium_pattern_v3_cardboard - Google Drive

rms_error: 0.5585031726381963
matrix: [[1.02952629e+03 0.00000000e+00 1.12392238e+03]
[0.00000000e+00 1.03046171e+03 6.21877250e+02]
[0.00000000e+00 0.00000000e+00 1.00000000e+00]]
Conclusion is that glass vs cardboard really matters.

Note: I can’t use the glass for robot camera calibration, because it is too heavy for the robot, so if we can make any other conclusion, I’d love to hear that!

  1. In these experiments, I did my best to cover the whole area of the image, which is @crackwitz main concern too.
  2. I am not an expert on this. I tried the following
rms_error, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(objpoints, imgpoints, gray.shape[::-1], None, None)
print(rms_error, mtx)
newcameramtx, roi = cv2.getOptimalNewCameraMatrix(mtx, dist, (gray.shape[1],gray.shape[0]), 1, (gray.shape[1],gray.shape[0]))

and in all of the experiments, I got a worse matrix after cv2.getOptimalNewCameraMatrix(). I’d appreciate a lot if you let me know if I am doing something wrong here.
I acknoledge that I see distortion, for example in this image, we can see that a straight line is not actually straight. 19.png - Google Drive
5. After adding flags=cv2.CALIB_FIX_ASPECT_RATIO, I get the following for 1.b, and you are right that I should use it, thank you!

rms_error: 0.17951353417738453
[[1.04101892e+03 0.00000000e+00 1.11823069e+03]
[0.00000000e+00 1.04101892e+03 6.25402928e+02]
[0.00000000e+00 0.00000000e+00 1.00000000e+00]]
average error: 0.02336407127412611
Original values from manufacturor:
cx= 1121.75
I think this just means that all the pixels would have 3 pixels offset in x dimension, and everything else is close to perfect.