Eye-to-hand calibration

This function cv.calibrateHandEye describes an eye-in-calibration process where camera is attached to a robot arm. I want to do an eye-to-hand calibration where a camera is fixed on a ceiling. My process is as the following:

  1. Fix calibration pattern on the robot gripper.
  2. Collect checkerboard poses, and robot poses. Robot pose is expressed as [x, y, z, w, p, r] (Euler angles)
  3. R_target2cam, t_target2cam are obtained from checkerboard poses. This is converted to H_target2cam.
  4. R_gripper2base, t_gripper2base are obtained from robot poses which are obtained in step 2.
  5. call cv.calibrateHandEye(R_gripper2base, t_gripper2base, R_target2cam, t_target2cam)
  6. The above step returns R_cam2gripper, t_cam2gripper. Convert this to H_cam2gripper (4x4 homogenous matrix)
  7. Get the pose of the target by calculating H_target2gripper = np.matmul(H_cam2gripper, H_target2cam)

However, my results are very far away from the desired solution. For example,
Instead of getting this [x, y, z, w, p, r]:
[140.338, 141.06, -109.266, 0.0, -0.002, 0.003] - desired

I get this:
[-320.61373838, 66.44008189, -51.46849641, -1.57738728, 0.02311303, -1.5894054] - what I get from step 7

What I might be doing wrong?

From the doc:


For an eye-to-hand configuration, you have to supply:

  • base2gripper (R_target2cam and t_target2cam parameters)
  • target2cam (R_gripper2base and t_gripper2base parameters)

and the method would estimate:

  • cam2base (R_cam2gripper and t_cam2gripper)

You can also have a look at this unit test to compare with your code.

Be careful to properly convert from Euler angles to the rotation matrix. Euler angles are tricky due to numerous conventions.

Many thanks, @Eduardo. Is it correct that I need to provide base2gripper in place of target2cam in the function? This is my code per your suggestion, however, I still receive the wrong values:

def eye_to_hand_calib(target_poses, robot_poses):
    target_poses (target2cam) are provided as list of tuples of (rvec, tvec).
    robot_poses (gripper2base) are provided as list of lists [x, y, z, w, p, r]

    # 1. Collect target poses
    R_target2cam = []
    t_target2cam = []
    for pose in target_poses:
        rvec, tvec = pose

    # 2. Collect robot poses
    R_gripper2base = []
    t_gripper2base = []
    for pose in robot_poses:
        # gripper2base
        t = pose[0:3]
        # gripper2base
        wpr = pose[3:]
        R = scipy.spatial.transform.Rotation.from_euler(
            "XYZ", wpr, degrees=True


    # 3. Transform from gripper2base to base2gripper
    R_base2gripper = []
    t_base2gripper = []
    for R_g2b, t_g2b in zip(R_gripper2base, t_gripper2base):
        R_b2g = R_g2b.T
        t_b2g = np.matmul(-R_b2g, t_g2b)


    # 4. Call calibration
    # R_calib, t_calib = cv2.calibrateHandEye(
    #     R_gripper2base=R_base2gripper,
    #     t_gripper2base=t_base2gripper,
    #     R_target2cam=R_target2cam,
    #     t_target2cam=t_target2cam,
    # )

    R_calib, t_calib = cv2.calibrateHandEye(
    print("R_cam2base: \n", R_calib)
    print("t_cam2base: \n", t_calib)

    H_base_camera = np.r_[np.c_[R, t_calib.flatten()], [[0, 0, 0, 1]]]
    return H_base_camera

R_b2g = -R_g2b.T

It should be instead:

R_b2g = R_g2b.T

Thanks, I forgot to fix it here. But, I still get wrong values.

Hi @Eduardo, I still cannot understand why we need to provide R_base2gripper and t_base2gripper to R_target2cam and t_target2cam instead of providing them to R_gripper2base and t_gripper2base. From the docs, the R_target2cam and t_target2cam should stay on their own places.

NB 1: I find the notation foo2bar inconvenient because it’s opposite to how matrices are written and how they compose. I would recommend

^\text{output frame} T_\text{input frame} = ~^\text{bar} T_\text{foo}

or T_bar_foo

because that composes like so:

^\text{baz} T_\text{foo} = ^\text{baz} T_\text{bar} ~\cdot~ ^\text{bar} T_\text{foo}


T_baz_foo = T_baz_bar @ T_bar_foo
# @ being numpy's matrix multiplication operator

and if you need to invert a matrix, you can use np.linalg.inv (example assuming two aruco markers and their poses T_cam_marker, and a relative pose: pose of m1 relative to m2’s frame):

\begin{align*} ^{m_2} T_{m_1} &= ~^{m_2} T_C &\cdot ~ ^C T_{m_1} \\ ^{m_2} T_{m_1} &= (^C T_{m_2})^{-1} &\cdot ~ ^C T_{m_1} \\ \end{align*}

(notice how the frames match up like dominos, and you can consider them to “cancel”)

the ways to read this are:

  • in the baz frame: pose of foo
  • transformation that moves into baz frame from foo frame

T or M, or A, all the same.

NB 2: chuck rvec and tvec. they are nearly impossible to calculate with. there is a numerical argument to be made for them but you aren’t trying to write obfuscated code, you are trying to write code you can understand tomorrow, in a week, in a month, in a year.

you need a convenience function that takes (rvec, tvec) and spits out a 4x4 matrix. those are trivial to work with.

def matrix_from_rtvec(rvec, tvec):
    (R, jac) = cv.Rodrigues(rvec) # ignore the jacobian
    M = np.eye(4)
    M[0:3, 0:3] = R
    M[0:3, 3] = tvec.squeeze() # 1-D vector, row vector, column vector, whatever
    return M

if anything needs rvec and tvec, make another convenience function that decomposes a 4x4 matrix. both operations involve cv.Rodrigues

def rtvec_from_matrix(M):
    (rvec, jac) = cv.Rodrigues(M[0:3, 0:3]) # ignore the jacobian
    tvec = M[0:3, 3]
    assert M[3] == [0,0,0,1], M # sanity check
    return (rvec, tvec)

NB 3: dumping some more convenience functions

def vec(vec):
    return np.asarray(vec)

def normalize(vec):
    return np.asarray(vec) / np.linalg.norm(vec)

def translate(dx=0, dy=0, dz=0):
    M = np.eye(4)
    M[0:3, 3] = (dx, dy, dz)
    return M

def rotate(axis, angle=None):
    if angle is None:
        rvec = vec(axis)
        rvec = normalize(axis) * angle

    (R, jac) = cv.Rodrigues(rvec)
    M = np.eye(4)
    M[0:3, 0:3] = R
    return M

def scale(s=1, sx=1, sy=1, sz=1):
    M = np.diag([s*sx, s*sy, s*sz, 1])
    return M

def apply_to_rowvec(T, data):
    (n,k) = data.shape

    if k == 4:
    elif k == 3:
        data = np.hstack([data, np.ones((n,1))])
        assert False, k

    data = data @ T.T # yes that's weird. one could transpose the data just as well. makes no difference really

    # TODO: check that the 4th values are still all 1, or else we have a 4D projective transformation, which is bad in this instance

    return data[:,:k]

1 Like

Indeed, my mistake.

Use as reference the doc and the unit test.

1 Like

Thank you all (@Eduardo and @crackwitz ) for your help. I have solved my problem, My main problem was in data collection. The eye-hand calibration method is very sensitive to the target and gripper poses. For all who might be interested in this topic in future, here is my final code for eye-hand calibration:

def calibrate_eye_hand(R_gripper2base, t_gripper2base, R_target2cam, t_target2cam, eye_to_hand=True):

    if eye_to_hand:
        # change coordinates from gripper2base to base2gripper
        R_base2gripper, t_base2gripper = [], []
        for R, t in zip(R_gripper2base, t_gripper2base):
            R_b2g = R.T
            t_b2g = -R_b2g @ t
        # change parameters values
        R_gripper2base = R_base2gripper
        t_gripper2base = t_base2gripper

    # calibrate
    R, t = cv2.calibrateHandEye(

    return R, t
1 Like