Getting metric scale around object location?

Hi, im using OpenCV to try and retrieve depth to an object Ive already detected, Im doing this to get pixel per cm at the object location.
so far the constraints I have are:

  1. I cant use a depth camera or a depth sensor
  2. I cant place a reference object at the object location

My Approach currently is using my phone camera to take two pictures a certain distance (baseline) apart to simulate sterescopy.
I have calibrate my camera using the standard chessboard pattern, Im wondering does using other patterns help?

I also tried to turns this pattern into metric by adding a ‘square_size’ param, like:

    square_size = 0.024
    termCriteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 30, 0.001)
    worldPtsCur = np.zeros((nRows*nCols, 3), np.float32)
    worldPtsCur[:, :2] = np.mgrid[0:nCols, 0:nRows].T.reshape(-1, 2) * square_size

then I do feature matching and use those points to get Essential Matrix E, and R, t like:

# Calculate Essential Matrix
E, mask = cv2.findEssentialMat(pts1, pts2, cameraMatrix, method=cv2.RANSAC, prob=0.999, threshold=1.0)
# Decompose the essential matrix
_, R, t, mask = cv2.recoverPose(E, pts1, pts2, cameraMatrix)

Ive noticed here that my x element in t is negative, it should be positive as my images are left to right, am I correct in that assumption?
lastly I tried both to just use x-axis disparity with this equation:
depth = f * baseline / disparity
and triangulating using cv2 like:

# Triangulate the 3D point
points_4D = cv2.triangulatePoints(P1, P2, normalized_point1[:2].reshape(2, 1), 
                                  normalized_point2[:2].reshape(2, 1))

# Convert from homogeneous coordinates to 3D
points_3D = points_4D[:3] / points_4D[3]  # Normalize by the fourth coordinate

# Extract the Z-coordinate as the metric depth
depth = points_3D[2][0]  # Metric depth

print("Metric Depth (Z-coordinate):", depth)

but the depth result from both does not match my ground truth on my test set.
is my approach generally, correct?
what are the points of inaccuracies that could be harming my depth calculation?
what can I do better as a whole?

Update:

  1. Calibration images, the furthest 2 are removed because of inability to find edges Calibration
  2. Sample image: finding matches
  3. Selecting a point manually in the left image and searching the right image along the epipolar line for the best match best match along epipolar line
  4. then using cv2.triangulate to triangulate the undistorted points to get depth,
    but my depth results are off from my ground truth.

things to note also:

  1. My calibration results I think are quite decent because for fx got 3043, and the manual calculation which is f = focal_mm / pixel size_µm,
    4.25mm / 0.0014 which is ≈ 3036
    is that a very big difference?
  2. for some reason the x of translation vector t from cv2.recoverpose is negative, the Images are passed left to right so if Im understanding correctly they should be positive correct? I took the images Right to Left but then passed them in Left to Right. I dont understand where I went wrong here
    Any advice is appreciated

not a single answer :frowning_face: