Estimating Camera position with solvePnP

Hello, This is my first post here, I will try to explain my issue as best as I can.

I want to validate an idea for camera->world calibration.

Initially, I setup a simulation and add a camera to it. Camera information is as follows:

focal_distance_mm = 8 # mm
pixel_resolution_near_plane_mm = 0.067 # mm
pixel_resolution_far_plane_mm = 0.904 # mm
camera_position = [0,250,610] # mm
camera_rotation = [0,180,90] # degrees
image_size = [2369, 1309] # pixels
# calculate camera matrix with function:

def simulateCamera(focal_length_mm, pixel_size_mm, w, h):
    focal_length_px = focal_length_mm/pixel_size_mm
    K = np.array([[focal_length_px, 0, w/2],
                  [0, focal_length_px, h/2],
                  [0.0,0.0,      1.0]], dtype=np.float64)
    dist_coef = np.zeros(4)
    print(f"K: {K}")
    return focal_length_px, h, w, K, dist_coef

camera_intrinsics_matrix = [[1.19402985e+02 0.00000000e+00 1.18450000e+03]
                                                     [0.00000000e+00 1.19402985e+02 6.54500000e+02]
                                                     [0.00000000e+00 0.00000000e+00 1.00000000e+00]]

Since I know the position and rotation of the camera in the simulated environment, I want to verify if I can estimate it using solvePnP, for this I need a set of points in camera plane and real world plane.

image_points_dict = {
        "food_dispenser": [
            1183,
            685
        ],
        "calib0": [
            729,
            409
        ],
        "calib1": [
            720,
            775
        ],
        "calib2": [
            1655,
            400
        ],
        "calib3": [
            1664,
            779
        ],
        "testpoint0": [
            591,
            220
        ],
        "testpoint1": [
            579,
            626
        ],
        "testpoint2": [
            567,
            1068
        ],
        "testpoint3": [
            1183,
            1095
        ],
        "testpoint4": [
            1869,
            1116
        ],
        "testpoint5": [
            1885,
            621
        ],
        "testpoint6": [
            1901,
            126
        ],
        "testpoint8": [
            1183,
            685
        ]
    } # pixel

robot_points_dict = {
        "food_dispenser": [0, 250, -10],
        "calib0": [250, 100, 0],
        "calib1": [250, 300, 10],
        "calib2": [-250, 100, 20],
        "calib3": [-250, 300, 30],
        "testpoint0": [320, 0, 10],
        "testpoint1": [320, 220, 20],
        "testpoint2": [320, 450, 30],
        "testpoint3": [0, 450, 60],
        "testpoint4": [-320, 450, 80],
        "testpoint5": [-320, 220, 90],
        "testpoint6": [-320, 0, 100],
        "testpoint8": [0, 250, -10],
    } # mm

I use solvePnP to estimate world → camera rotation vector and translation vector:

_, rvec, tvec = cv2.solvePnP(robot_points, camera_points, camera_matrix, dist_coeffs)
R, _ = cv2.Rodrigues(rvec)

However, when I try to estimate the camera position and rotation, it seems off:

        camera_position = -np.matrix(R).T * np.matrix(tvec)
        print(f"Camera_position: {camera_position}")

        r = Rotation.from_rotvec([rvec[0][0],rvec[1][0],rvec[2][0]])
        rot = r.as_euler('xyz', degrees=True)

        tx = camera_position[0][0]
        ty = camera_position[1][0]
        tz = camera_position[2][0]

        rx = round(180-rot[0],5) 
        ry = round(rot[1],5) 
        rz = round(rot[2],5) 

        print(f"Camera rotation angles: {rx}, {ry}, {rz}")

After running the last piece of code, I expected translation and rotation as stated above but got other values:


camera_position = [0,250,610]
camera_rotation = [0,180,90]

Camera_position: [[  1.88627639]
 [232.25790566]
 [ 80.38496172]]
Camera rotation angles: 2.50999, 2.02833, 179.90588

Additionally, what is really strange is that if I use project points method I get a pretty close estimate world->camera:

estimated_camera_point = cv2.projectPoints(np.array(robot_points_dict[key], dtype=np.float64), R, tvec, camera_matrix, dist_coeffs)[0].reshape(-1, 2)#imgpoints
        print(f"Estimated point value: {estimated_camera_point}, expected value: {image_points_dict[key]}")
Estimated point value: [[1182.77515506  672.5497037 ]], expected value: [1183, 685]

Can someone help me pinpoint what am I doing wrong?