I am building stereo vision based on this playlist https://www.youtube.com/playlist?list=PL2zRqk16wsdoCCLpou-dGo7QQNks1Ppzo and few scripts from github.
Setup is 2 raspberry pi cameras located 15-20cm from each other (angle of view is similar but not fully aligned) and 14 ARUCO markers in view.
Test description:
- both cameras make photos
- then I find 14x4=56 points on both photos and match them
- calculate fundamental and essential matrix with cv.findFundamentalMat()
- find 3d position, using cv.recoverPose()
- calculate reprojection error
I made 20 tests like these, coords of ARUCO markers can change on 1-3 pixels and it significantly affect result, reprojection error can be up to 25 pixels.
Even in best tests where reprojection error is small, 3d coords can be +30% (probably just scale) and fundamental matrix is very different (some items are same but for some even sign is different).
Fundamental:
[[ 1.02404361e-08 3.94424497e-07 4.19287599e-04]
[-6.06709285e-07 1.48665703e-07 1.10003616e-02]
[-9.95485243e-04 -1.09427077e-02 1.00000000e+00]]
Right camera rotate/transform:
[[ 0.99724478 -0.06396331 0.03757064 -0.9957763 ]
[ 0.06468712 0.9977364 -0.01837518 0.06679281]
[-0.03631026 0.02075489 0.99912502 -0.06299437]]
1st point coords:
[ 1.5546314 1.4780324 14.06956527]
Left reproj error: avg=0.41166796925350146, max=1.2600882536660383
Right reproj error: avg=0.41589630606977707, max=1.270719461314834
============================
Fundamental:
[[-1.34937569e-08 1.43070416e-07 5.18306040e-04]
[-6.07612283e-07 1.61790904e-07 1.35069458e-02]
[-9.94700681e-04 -1.32211279e-02 1.00000000e+00]]
Right camera rotate/transform:
[[ 0.99615666 -0.06037661 0.06345532 -0.99876592]
[ 0.06158883 0.99795128 -0.01732246 0.04633638]
[-0.06227945 0.02116402 0.99783433 -0.01787665]]
1st point coords:
[ 1.14391187 1.08137137 10.35207307]
Left reproj error: avg=0.5628295566779377, max=1.888344504666136
Right reproj error: avg=0.5669055402134264, max=1.8905979485852762
How can I make more 3d position correct and stable? I know that I can focus on Aruco, e.g. subpixel refinement. But I find it strange that 1-3 pixels shift in 2d coords makes 25 pixels reprojection error.
Fundamental:
[[-3.42932607e-07 -6.04614005e-06 3.68265220e-03]
[ 5.10704843e-06 -9.20612123e-07 1.93773428e-02]
[-3.68138030e-03 -1.76805591e-02 1.00000000e+00]]
Right camera rotate/transform:
[[ 0.9964071 -0.06994167 0.04776044 -0.92437783]
[ 0.06946588 0.99751744 0.01155218 -0.02865833]
[-0.04844985 -0.00819295 0.99879201 0.38040022]]
1st point coords:
[ 1.29595325 1.10718262 11.72617239]
Left reproj error: avg=15.227097016132225, max=28.05660685973718
Right reproj error: avg=14.848556273144384, max=27.03773318546886
Code:
import numpy as np
import cv2 as cv
def get_3d_points(proj_left, proj_right, pts_left, pts_right):
p3d = cv.triangulatePoints(proj_left, proj_right, pts_left.T, pts_right.T)
p3d /= p3d[3]
return p3d
def calc_reproj_error(p3d, pts, proj_matrix):
pts = np.transpose(pts)
reprojected_pt = np.matmul(proj_matrix, p3d)
reprojected_pt /= reprojected_pt[2]
reprojected_pt = reprojected_pt[:2, :]
error = np.linalg.norm(reprojected_pt - pts, axis=0)
return np.average(error), np.max(error)
K = np.array([[3420 / 2, 0, 2304 / 2], [0, 3420 / 2, 1296 / 2], [0, 0, 1]], dtype=float)
COUNT_DATAPOINTS = 20
def main():
for t in range(COUNT_DATAPOINTS):
print('=' * 100)
points_left = np.loadtxt(f"data/pts_{t}_left.txt", delimiter=",")
points_right = np.loadtxt(f"data/pts_{t}_right.txt", delimiter=",")
F, _ = cv.findFundamentalMat(points_left, points_right, cv.FM_RANSAC, 1, 0.99999)
print("Fundamental:", F, sep='\n')
E = np.matmul(np.matmul(np.transpose(K), F), K)
print()
Rt_left = np.array([[1,0,0,0], [0,1,0,0], [0,0,1,0]], dtype=float)
Rt_right = np.empty((3,4), dtype=float)
retval, R, t, mask = cv.recoverPose(E, points_left, points_right, K)
Rt_right[:3, :3] = R
Rt_right[:3, 3] = t.ravel()
P_left = np.matmul(K, Rt_left)
P_right = np.matmul(K, Rt_right)
print("Right camera rotate/transform:", Rt_right, sep='\n')
print()
points_3d = get_3d_points(P_left, P_right, points_left, points_right)
print("1st point coords:", points_3d.T[0, :3], sep='\n')
print()
left_avg, left_max = calc_reproj_error(points_3d, points_left, P_left)
print(f"Left reproj error: avg={left_avg}, max={left_max}")
right_avg, right_max = calc_reproj_error(points_3d, points_right, P_right)
print(f"Right reproj error: avg={right_avg}, max={right_max}")
if __name__ == "__main__":
main()
full code and images (150 MB): https://www.dropbox.com/scl/fi/3c5k1f5zcwh0ga9reqk0b/stereo_vision.zip?rlkey=n5ohswgs3wzh07w73zp83fclb&st=g50mlbek&dl=0