Hello! I am trying to do Eye-to-Hand calibration using a LeRobot SO-101 and stationary Realsense camera. Here is a google drive link which links all the appropriate and necessary information.
Link: Opencv Post - Google Drive
I have the following camera captures to perform camera calibration:
My calibration images can be visualized in the google drive link.
I am trying to get the transformation matrix T_base2camera, which when measured in meters, the translation component should be roughly [0.80, 0.15, 0.53] from the robot’s base when using a tape measure, which I validate in the manual_measurements folder:
I have the following pipeline with provided code:
Calculate T_base2gripper for each image:
def map_joint_value(raw, raw_range, phys_range):
return phys_range[0] + (raw - raw_range[0]) / (raw_range[1] - raw_range[0]) * (phys_range[1] - phys_range[0])
raw_ranges = {
'shoulder_pan': [-100, 100],
'shoulder_lift': [-100, 100],
'elbow_flex': [-100, 100],
'wrist_flex': [-100, 100],
'wrist_roll': [-100, 100],
'gripper': [0, 100]
}
phys_ranges = {
'shoulder_pan': [-1.91986, 1.91986],
'shoulder_lift': [-1.74533, 1.74533],
'elbow_flex': [-1.69, 1.69],
'wrist_flex': [-1.65806, 1.65806],
'wrist_roll': [-2.74385, 2.84121],
'gripper': [-0.174533, 1.74533]
}
joint_positions = {
‘shoulder_pan’: map_joint_value(action[‘shoulder_pan.pos’], raw_ranges[‘shoulder_pan’], phys_ranges[‘shoulder_pan’]),
‘shoulder_lift’: map_joint_value(action[‘shoulder_lift.pos’], raw_ranges[‘shoulder_lift’], phys_ranges[‘shoulder_lift’]),
‘elbow_flex’: map_joint_value(action[‘elbow_flex.pos’], raw_ranges[‘elbow_flex’], phys_ranges[‘elbow_flex’]),
‘wrist_flex’: map_joint_value(action[‘wrist_flex.pos’], raw_ranges[‘wrist_flex’], phys_ranges[‘wrist_flex’]),
‘wrist_roll’: map_joint_value(action[‘wrist_roll.pos’], raw_ranges[‘wrist_roll’], phys_ranges[‘wrist_roll’]),
‘gripper’: map_joint_value(action[‘gripper.pos’], raw_ranges[‘gripper’], phys_ranges[‘gripper’])
}
while True:
action = teleop_device.get_action()
robot.send_action(action)
T = robot_urdf.link_fk(cfg=joint_positions)[robot_urdf.link_map[link_name]]
base_to_gripper_transforms.append(T)
I’ve validated that this works using a ruler and the measured translation components.
Calculate T_camera2target for each image:
import cv2
import numpy as np
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 30, 0.001)
# Define chess board parameters
square_size_mm = 15 # Example tag size in mm
square_size_m = square_size_mm / 1000
# Capture or load calibration images
image_paths = [
"calibration_data/images/img0.png",
"calibration_data/images/img1.png",
"calibration_data/images/img2.png",
"calibration_data/images/img3.png",
"calibration_data/images/img4.png",
"calibration_data/images/img5.png",
"calibration_data/images/img6.png",
"calibration_data/images/img7.png",
"calibration_data/images/img8.png",
"calibration_data/images/img9.png",
"calibration_data/images/img10.png",
"calibration_data/images/img11.png",
"calibration_data/images/img12.png",
"calibration_data/images/img13.png",
"calibration_data/images/img14.png",
"calibration_data/images/img15.png",
"calibration_data/images/img16.png",
"calibration_data/images/img17.png",
"calibration_data/images/img18.png",
"calibration_data/images/img19.png",
#"calibration_data/images/img20.png",
#"calibration_data/images/img21.png",
#"calibration_data/images/img22.png",
]
# Store detected image points and corresponding world points
obj_points = [] # 3D points in world coordinate system
img_points = [] # 2D points in image plane
# prepare object points, like (0,0,0), (1,0,0), (2,0,0) ....,(6,5,0)
objp = np.zeros((3*3,3), np.float32)
objp[:,:2] = np.mgrid[0:3,0:3].T.reshape(-1,2) * square_size_m
for image_path in image_paths:
img = cv2.imread(image_path)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Find the chess board corners
ret, corners = cv2.findChessboardCorners(gray, (3,3), None)
# If found, add object points, image points (after refining them)
if ret == True:
obj_points.append(objp)
corners2 = cv2.cornerSubPix(gray,corners, (11,11), (-1,-1), criteria)
img_points.append(corners2)
# Draw and display the corners
cv2.drawChessboardCorners(img, (3,3), corners2, ret)
cv2.imshow('img', img)
cv2.waitKey(100)
cv2.destroyAllWindows()
# 5. Perform camera calibration
# Assuming you have a rough initial guess for image size for the calibration function
img_size = gray.shape[::-1] # (width, height)
ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(
obj_points, img_points, img_size, None, None
)
camera_to_target_transforms = []
for rvec,tvec in zip(rvecs,tvecs):
R, _ = cv2.Rodrigues(rvec)
# Build the 4x4 transformation matrix
# and invert to get camera_to_target
T_camera_to_target = np.eye(4)
T_camera_to_target[:3, :3] = R.T
T_camera_to_target[:3, 3] = -R.T @ tvec.flatten()
camera_to_target_transforms.append(T_camera_to_target)
#print("Camera Matrix (Intrinsics):\n", mtx)
#print("Distortion Coefficients:\n", dist)
for tvec in tvecs:
print("Position: \n", tvec)
mean_error = 0
for i in range(len(obj_points)):
imgpoints2, _ = cv2.projectPoints(obj_points[i], rvecs[i], tvecs[i], mtx, dist)
error = cv2.norm(img_points[i], imgpoints2, cv2.NORM_L2)/len(imgpoints2)
mean_error += error
print( "total error: {}".format(mean_error/len(obj_points)) )
np.save('calibration_data/camera_to_target_transform.npy', np.array(camera_to_target_transforms))
np.save('calibration_data/camera_instrinsic.npy', np.array(mtx))
np.save('calibration_data/camera_distortion_coefficients.npy', np.array(dist))
The projection error is very small as seen here:
total error: 0.024616457435904505
Calculate T_base2camera
# taken from https://forum.opencv.org/t/eye-to-hand-calibration/5690
def calibrate_eye_hand(R_gripper2base, t_gripper2base, R_target2cam, t_target2cam, eye_to_hand=True):
if eye_to_hand:
# change coordinates from gripper2base to base2gripper
R_base2gripper, t_base2gripper = [], []
for R, t in zip(R_gripper2base, t_gripper2base):
R_b2g = R.T
t_b2g = -R_b2g @ t
R_base2gripper.append(R_b2g)
t_base2gripper.append(t_b2g)
# change parameters values
R_gripper2base = R_base2gripper
t_gripper2base = t_base2gripper
# calibrate
R, t = cv2.calibrateHandEye(
R_gripper2base=R_gripper2base,
t_gripper2base=t_gripper2base,
R_target2cam=R_target2cam,
t_target2cam=t_target2cam,
)
return R, t
base_to_gripper_transform = np.load('calibration_data/base_to_gripper_transform.npy')
camera_to_target_transform = np.load('calibration_data/camera_to_target_transform.npy')
R_gripper2base = []
t_gripper2base = []
R_target2cam = []
t_target2cam = []
for T_bg, T_ct in zip(base_to_gripper_transform, camera_to_target_transform):
# Invert transforms
T_gb = np.linalg.inv(T_bg) # gripper -> base
T_tc = np.linalg.inv(T_ct) # target -> camera
# Extract R, t
R_gripper2base.append(T_gb[:3, :3])
t_gripper2base.append(T_gb[:3, 3])
R_target2cam.append(T_tc[:3, :3])
t_target2cam.append(T_tc[:3, 3])
R_cam2base, t_cam2base = calibrate_eye_hand(R_gripper2base, t_gripper2base, R_target2cam, t_target2cam, eye_to_hand=True)
T_cam2base = np.eye(4)
T_cam2base[:3, :3] = R_cam2base
T_cam2base[:3, 3] = t_cam2base.squeeze()
T_base2cam = np.linalg.inv(T_cam2base)
print(T_base2cam[:3, 3])
Which gives me a translation component [ 0.02724986 -0.30743259 0.82794188] that is obviously wrong.
Potential Problems
I’m not sure if I am providing the correct inputs into cv2.calibrateHandEye. I am mostly just following this post.
It could be the case that my chessboard is not large enough? I’ve seen some posts around that specify that the checkerboard should take up more than half of the image space, which mine clearly does not. I read Calibration Best Practices from calib[dot]io which verifies that assumption. The issue is that this robot is not very powerful with limited DOF, so getting a larger one on could be burdensome.
The chess board is attached via cardboard and tape, which presumably could interfere with calibration, as seen in the google drive link. Does anyone have any insight? I’m familiar with all related posts to OpenCV in this domain.
I’m surprised that there is not much discussion around Eye-To-Hand camera calibration. My last resort is to migrate to a MoveIt/ROS platform, but I would rather not do that if I don’t have to. I hope this was informative enough to get some meaningful feedback. Thank you for your time!
