Eye To Hand Calibration Using LeRobot

Sergio_Orozco · November 7, 2025, 2:03pm

Hello! I am trying to do Eye-to-Hand calibration using a LeRobot SO-101 and stationary Realsense camera. Here is a google drive link which links all the appropriate and necessary information.

Link: Opencv Post - Google Drive

I have the following camera captures to perform camera calibration:

My calibration images can be visualized in the google drive link.

I am trying to get the transformation matrix T_base2camera, which when measured in meters, the translation component should be roughly [0.80, 0.15, 0.53] from the robot’s base when using a tape measure, which I validate in the manual_measurements folder:

I have the following pipeline with provided code:

Calculate T_base2gripper for each image:

def map_joint_value(raw, raw_range, phys_range):
    return phys_range[0] + (raw - raw_range[0]) / (raw_range[1] - raw_range[0]) * (phys_range[1] - phys_range[0])

raw_ranges = {
    'shoulder_pan': [-100, 100],
    'shoulder_lift': [-100, 100],
    'elbow_flex': [-100, 100],
    'wrist_flex': [-100, 100],
    'wrist_roll': [-100, 100],
    'gripper': [0, 100]
}
phys_ranges = {
    'shoulder_pan': [-1.91986, 1.91986],
    'shoulder_lift': [-1.74533, 1.74533],
    'elbow_flex': [-1.69, 1.69],
    'wrist_flex': [-1.65806, 1.65806],
    'wrist_roll': [-2.74385, 2.84121],
    'gripper': [-0.174533, 1.74533]
}

joint_positions = {
‘shoulder_pan’: map_joint_value(action[‘shoulder_pan.pos’], raw_ranges[‘shoulder_pan’], phys_ranges[‘shoulder_pan’]),
‘shoulder_lift’: map_joint_value(action[‘shoulder_lift.pos’], raw_ranges[‘shoulder_lift’], phys_ranges[‘shoulder_lift’]),
‘elbow_flex’: map_joint_value(action[‘elbow_flex.pos’], raw_ranges[‘elbow_flex’], phys_ranges[‘elbow_flex’]),
‘wrist_flex’: map_joint_value(action[‘wrist_flex.pos’], raw_ranges[‘wrist_flex’], phys_ranges[‘wrist_flex’]),
‘wrist_roll’: map_joint_value(action[‘wrist_roll.pos’], raw_ranges[‘wrist_roll’], phys_ranges[‘wrist_roll’]),
‘gripper’: map_joint_value(action[‘gripper.pos’], raw_ranges[‘gripper’], phys_ranges[‘gripper’])
}

while True:
    action = teleop_device.get_action()

    robot.send_action(action)

    T = robot_urdf.link_fk(cfg=joint_positions)[robot_urdf.link_map[link_name]]

    base_to_gripper_transforms.append(T)

I’ve validated that this works using a ruler and the measured translation components.

Calculate T_camera2target for each image:


import cv2
import numpy as np

criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 30, 0.001)

# Define chess board parameters
square_size_mm = 15  # Example tag size in mm
square_size_m = square_size_mm / 1000

# Capture or load calibration images
image_paths = [
    "calibration_data/images/img0.png",
    "calibration_data/images/img1.png",
    "calibration_data/images/img2.png",
    "calibration_data/images/img3.png",
    "calibration_data/images/img4.png",
    "calibration_data/images/img5.png",
    "calibration_data/images/img6.png",
    "calibration_data/images/img7.png",
    "calibration_data/images/img8.png",
    "calibration_data/images/img9.png",
    "calibration_data/images/img10.png",
    "calibration_data/images/img11.png",
    "calibration_data/images/img12.png",
    "calibration_data/images/img13.png",
    "calibration_data/images/img14.png",
    "calibration_data/images/img15.png",
    "calibration_data/images/img16.png",
    "calibration_data/images/img17.png",
    "calibration_data/images/img18.png",
    "calibration_data/images/img19.png",
    #"calibration_data/images/img20.png",
    #"calibration_data/images/img21.png",
    #"calibration_data/images/img22.png",

]

# Store detected image points and corresponding world points
obj_points = []  # 3D points in world coordinate system
img_points = []  # 2D points in image plane

# prepare object points, like (0,0,0), (1,0,0), (2,0,0) ....,(6,5,0)
objp = np.zeros((3*3,3), np.float32)
objp[:,:2] = np.mgrid[0:3,0:3].T.reshape(-1,2) * square_size_m

for image_path in image_paths:
    img = cv2.imread(image_path)
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

    # Find the chess board corners
    ret, corners = cv2.findChessboardCorners(gray, (3,3), None)

    # If found, add object points, image points (after refining them)
    if ret == True:
        obj_points.append(objp)

        corners2 = cv2.cornerSubPix(gray,corners, (11,11), (-1,-1), criteria)
        img_points.append(corners2)

        # Draw and display the corners
        cv2.drawChessboardCorners(img, (3,3), corners2, ret)
        cv2.imshow('img', img)
        cv2.waitKey(100)

cv2.destroyAllWindows()

# 5. Perform camera calibration
# Assuming you have a rough initial guess for image size for the calibration function
img_size = gray.shape[::-1] # (width, height)

ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(
    obj_points, img_points, img_size, None, None
)

camera_to_target_transforms = []
for rvec,tvec in zip(rvecs,tvecs):
    R, _ = cv2.Rodrigues(rvec)

    # Build the 4x4 transformation matrix
    # and invert to get camera_to_target
    T_camera_to_target = np.eye(4)
    T_camera_to_target[:3, :3] = R.T
    T_camera_to_target[:3, 3] = -R.T @ tvec.flatten()

    camera_to_target_transforms.append(T_camera_to_target)

#print("Camera Matrix (Intrinsics):\n", mtx)
#print("Distortion Coefficients:\n", dist)

for tvec in tvecs:
    print("Position: \n", tvec)


mean_error = 0
for i in range(len(obj_points)):
    imgpoints2, _ = cv2.projectPoints(obj_points[i], rvecs[i], tvecs[i], mtx, dist)
    error = cv2.norm(img_points[i], imgpoints2, cv2.NORM_L2)/len(imgpoints2)
    mean_error += error

print( "total error: {}".format(mean_error/len(obj_points)) )

np.save('calibration_data/camera_to_target_transform.npy', np.array(camera_to_target_transforms))
np.save('calibration_data/camera_instrinsic.npy', np.array(mtx))
np.save('calibration_data/camera_distortion_coefficients.npy', np.array(dist))

The projection error is very small as seen here:

total error: 0.024616457435904505

Calculate T_base2camera

# taken from https://forum.opencv.org/t/eye-to-hand-calibration/5690
def calibrate_eye_hand(R_gripper2base, t_gripper2base, R_target2cam, t_target2cam, eye_to_hand=True):

    if eye_to_hand:
        # change coordinates from gripper2base to base2gripper
        R_base2gripper, t_base2gripper = [], []
        for R, t in zip(R_gripper2base, t_gripper2base):
            R_b2g = R.T
            t_b2g = -R_b2g @ t
            R_base2gripper.append(R_b2g)
            t_base2gripper.append(t_b2g)

        # change parameters values
        R_gripper2base = R_base2gripper
        t_gripper2base = t_base2gripper

    # calibrate
    R, t = cv2.calibrateHandEye(
        R_gripper2base=R_gripper2base,
        t_gripper2base=t_gripper2base,
        R_target2cam=R_target2cam,
        t_target2cam=t_target2cam,
    )

    return R, t

base_to_gripper_transform = np.load('calibration_data/base_to_gripper_transform.npy')
camera_to_target_transform = np.load('calibration_data/camera_to_target_transform.npy')

R_gripper2base = []
t_gripper2base = []
R_target2cam = []
t_target2cam = []

for T_bg, T_ct in zip(base_to_gripper_transform, camera_to_target_transform):
    # Invert transforms

    T_gb = np.linalg.inv(T_bg)  # gripper -> base
    T_tc = np.linalg.inv(T_ct)  # target -> camera

    # Extract R, t
    R_gripper2base.append(T_gb[:3, :3])
    t_gripper2base.append(T_gb[:3, 3])
    R_target2cam.append(T_tc[:3, :3])
    t_target2cam.append(T_tc[:3, 3])

R_cam2base, t_cam2base = calibrate_eye_hand(R_gripper2base, t_gripper2base, R_target2cam, t_target2cam, eye_to_hand=True)

T_cam2base = np.eye(4)
T_cam2base[:3, :3] = R_cam2base
T_cam2base[:3, 3] = t_cam2base.squeeze()

T_base2cam = np.linalg.inv(T_cam2base)

print(T_base2cam[:3, 3])

Which gives me a translation component [ 0.02724986 -0.30743259 0.82794188] that is obviously wrong.

Potential Problems

I’m not sure if I am providing the correct inputs into cv2.calibrateHandEye. I am mostly just following this post.

It could be the case that my chessboard is not large enough? I’ve seen some posts around that specify that the checkerboard should take up more than half of the image space, which mine clearly does not. I read Calibration Best Practices from calib[dot]io which verifies that assumption. The issue is that this robot is not very powerful with limited DOF, so getting a larger one on could be burdensome.

The chess board is attached via cardboard and tape, which presumably could interfere with calibration, as seen in the google drive link. Does anyone have any insight? I’m familiar with all related posts to OpenCV in this domain.

I’m surprised that there is not much discussion around Eye-To-Hand camera calibration. My last resort is to migrate to a MoveIt/ROS platform, but I would rather not do that if I don’t have to. I hope this was informative enough to get some meaningful feedback. Thank you for your time!

Topic		Replies	Views
Camera calibration and Hand-eye calibration together Python calib3d	4	4002	June 5, 2023
Eye-to-hand calibration calib3d , calibration , cg	14	25198	November 16, 2025
Eye to Hand Calibration OpenCV Python	0	1384	July 8, 2022
Eye-in-hand calibration with IRB140 axes are wrong Python calib3d	0	128	April 2, 2024
CalibrateHandEye gives back huge or wrong numbers for real robot but worked in simulated Python calib3d	0	321	December 19, 2023

Eye To Hand Calibration Using LeRobot

Calculate T_base2gripper for each image:

Calculate T_camera2target for each image:

Calculate T_base2camera

Potential Problems

Related topics