Is it possible to stereoRectify with externally calibrated cameras?

I have been trying to rectify 2 images with stereoRectify. The cameras are calibrated both intrinsic and extrinsic. However these calibrations are external, they do not come from the function stereoCalibrate. Still with all this known I do not get good rectification result. Am I doing anything wrong? (see code) or is it simply not possible? I have been stuck on this for a while now so help would be greatly appreciated :slight_smile:

fx = 1920 / (2 )
fy = 1080 / 2
# Define principal point
px = 1920 / 2
py = 1080 / 2
# Create calibration matrix
K = np.array([[fx, 0, px],
            [0, fx, py],
            [0, 0, 1]])
filename1 = os.path.join(carla_save_path,'SSLidar_and_camera/sim_'+str(sim_nmber)+'/transformation_matrices/frame_' + str(left_indx) + '/transformation_m_actor_3')
t_m_1 = np.loadtxt(filename1, delimiter=",")  #transformation matrix 1: R|T

filename2 = os.path.join(carla_save_path,'SSLidar_and_camera/sim_'+str(sim_nmber)+'/transformation_matrices/frame_' + str(right_indx) + '/transformation_m_actor_3')
t_m_2 = np.loadtxt(filename2, delimiter=",") #transformation matrix 2: R|T

T_1 =t_m_1[0:3,3]
T_2 =t_m_2[0:3,3]

R1 = t_m_1[0:3,0:3] #rotaion maxtrix 1
R2 = t_m_2[0:3,0:3] #rotaion maxtrix 2

R = R1.T @ R2 # rotation from coordinate system 2 to 1
T = R1.T@(T_2-T_1) # translation from coordinate system 2 to 1

img_path_l = os.path.join(carla_save_path,'SSLidar_and_camera/sim_'+str(sim_nmber)+'/camera_frames/'+ str(num_c)+'_' + str(left_indx) +'.png' )
img_path_r = os.path.join(carla_save_path,'SSLidar_and_camera/sim_'+str(sim_nmber)+'/camera_frames/'+ str(num_c)+'_' + str(right_indx) +'.png')

imgL = cv.imread(img_path_l, 0)
imgR = cv.imread(img_path_r, 0)

K1 = K
K2 = K1

dist1 = np.array([0,0,0,0,0])
dist2 = np.array([0,0,0,0,0])

img_size = (1920,1080)

R1,R2,P1,P2,Q,roi_left, roi_right =cv.stereoRectify(K1,dist1,K2,dist2,img_size,R,T)

map_1_x,map_1_y = cv.initUndistortRectifyMap(K1,dist1,R1,P1,img_size,cv.CV_32FC1)
map_2_x,map_2_y = cv.initUndistortRectifyMap(K2,dist1,R2,P2,img_size,cv.CV_32FC1)

imgR = cv2.remap(imgR, map_2_x, map_2_y, cv2.INTER_LANCZOS4, cv2.BORDER_CONSTANT, 0)
imgL = cv2.remap(imgL, map_1_x, map_1_y, cv2.INTER_LANCZOS4, cv2.BORDER_CONSTANT, 0)

what do the rectified pictures look like? what do all the matrices look like?

intrinsics: fx,fy are wrong. focal length does not depend on width or height. it’s a scale factor. also fx = fy.

R,T stuff: can’t comment. I’d recommend working with 4x4 matrices containing both rotation and translation. those compose and invert trivially. you can always take them apart for whatever needs the info separated.


Here is the result.
The focal length is just a simplification. The cameras are from a simulation and so through an equation I got that it is always the width/2 with the camera settings I have.

Do you know how to map from coordinate system 2 to 1 with the 4x4 matrices? I do get them from the cameras they are t_m_1 and t_m_2

Here are whats in the matrices

# t_m_1
[[ -0.11152495,   0.99376166,   0.       ,  -22.76841736],
 [ -0.99376166,  -0.11152495,   0.       ,  -69.82017517],
 [  0.        ,  -0.        ,   1.       ,    1.23000002],
 [  0.        ,   0.        ,   0.       ,    1.        ]]

# t_m_2
[[ 4.53061461e-02,  9.98973131e-01,  0.00000000e+00, -2.20175171e+01],
 [-9.98973131e-01,  4.53061461e-02,  0.00000000e+00, -6.98233109e+01],
 [ 0.00000000e+00, -0.00000000e+00,  1.00000000e+00,  1.23000002e+00],
 [ 0.00000000e+00,  0.00000000e+00,  0.00000000e+00,  1.00000000e+00]]

# T_1
[-22.76841736, -69.82017517,   1.23000002]

# T_2
[-22.01751709, -69.82331085,   1.23000002]

# R1
[[-0.11152495,  0.99376166,  0.        ],
 [-0.99376166, -0.11152495,  0.        ],
 [ 0.        , -0.        ,  1.        ]]

# R2
[[ 0.04530615,  0.99897313,  0.        ],
 [-0.99897313,  0.04530615,  0.        ],
 [ 0.        , -0.        ,  1.        ]]

# R
[[ 0.98768843, -0.15643394,  0.        ],
 [ 0.15643394,  0.98768843,  0.        ],
 [ 0.        ,  0.        ,  1.        ]]

# T
[-0.08062799,  0.7465656,   0.        ]

those pictures look like the sources were mixed up.

be sure you know which picture is the “left eye” and “right eye” one, and make sure all the code, data, … doesn’t get mixed up.

what units are your t_m_*?

>>> tm2[:3,3] - tm1[:3,3]
array([ 0.7509 , -0.00314,  0.     ])

world translation of 0.75 in X.

>>> np.linalg.inv(tm2) @ tm1
array([[ 0.98769,  0.15643,  0.     , -0.03715],
       [-0.15643,  0.98769,  0.     , -0.74999],
       [ 0.     ,  0.     ,  1.     ,  0.     ],
       [ 0.     ,  0.     ,  0.     ,  1.     ]])

>>> (rvec, jac) = cv.Rodrigues(_[:3,:3]); rvec
array([[ 0.     ],
       [ 0.     ],
       [-0.15708]])

the transformation from “1”-space to “2”-space. not sure how you label your matrices. exactly (?!) 9 degrees of rotation around the Z axis?

t_m_* is in meters and degrees. 0.75 meters and 9 degrees around Z-axis seems correct! How do i use that in the stereoRectify function, what is R and T suppose to be?

I know that the labels are no the best, but it stands for transformation_matrix_1 and one being the left image. A little confusing i know…

documentation for that, last I checked, left me unsatisfied, and the APIs do magic stuff that’s explained only in books, because someone once implemented what was in a book, with at most a reference to that book.

direction of transformations matters, and notation too. notation helps you not confuse yourself. I don’t know which way these APIs want the transformations. check docs, check the tutorials section for worked examples, check any references.

that’s exactly what I mean. that name doesn’t express which frames are involved and in which direction the transformation happens.

if you have an aruco marker or chessboard, and you get its pose in each camera’s frame, you have ^{c_1}T_{m} and ^{c_2}T_{m}, so the marker represents your world frame, but you only have its pose in each camera’s frame.

your matrices look like they contain world-frame poses of each camera, with nominal values (they’re round, they must have been read from a yardstick or sth). that should be ^{m}T_{c_1} and ^{m}T_{c_2}. I’ll stick with m for “world”; it’s arbitrary.

Then you can calculate:

\begin{align*} ~^{m}T_{c_1} &= (^{c_1}T_{m})^{-1} \\ ~^{m}T_{c_2} &= (^{c_2}T_{m})^{-1} \\ ~^{c_2}T_{c_1} &= ~^{c_2}T_{m} \cdot ~^{m}T_{c_1}\\ ~^{c_1}T_{c_2} &= ~^{c_1}T_{m} \cdot ~^{m}T_{c_2}\\ \end{align*}

and those I would map to identifiers in code like T_m_c1, T_m_c2, T_c2_c1, T_c1_c2, …

I realize now that my calculations earlier might not get this right. I don’t know the situation. I’m guessing.

fortunately, docs say.

R: Rotation matrix from the coordinate system of the first camera to the second camera, see stereoCalibrate.
T: Translation vector from the coordinate system of the first camera to the second camera, see stereoCalibrate.

so, given all the assumptions I can make or will anyway, I believe these arguments should look like so:

T_w_c1 = T_m_1
T_w_c2 = T_m_2
T_c2_w = inv(T_w_c2)
T_c2_c1 = T_c2_w @ T_w_c1
R = T_c2_c1[:3,:3]
t = T_c2_c1[:3,3]

no guarantees though. can’t run this.

That is how I interpreted the documentation as well. However it still does not work. I don’t know what is wrong but I think it is time to try other methods instead. Thank you for your help!