Large reprojection error using solvePnP to estimate OpenGL view matrix

Hi,

I’ve been trying to estimate the view matrix used to project points with OpenGL via OpenCV’s solvePnP. Unfortunately, my reprojection error is rather high (50-100+ pixels) and I cannot figure out why.

Here is my demonstration setup:

# rough camera matrix estimation
def camera_matrix(fov, width, height):
    fov_rad = np.radians(fov)
    f = 1.0 / np.tan(fov_rad / 2)
    
    hw = width / 2
    hh = height / 2
    return np.array([[f * hw, 0,      hw],
                     [0,      f * hw, hh],
                     [0,      0,       1]])

# based on https://fruty.io/2019/08/29/augmented-reality-with-opencv-and-opengl-the-tricky-projection-matrix/
def projection_matrix(K, width, height, near, far):
    w = width
    h = height
    return np.array([[2*K[0][0]/w,    -2*K[0][1],     (w - 2*K[0][2])/w,      0],
                     [0,               2*K[1][1]/h,   (-h + 2*K[1][2])/h,     0],
                     [0,               0,             (-far-near)/(far-near), -2*far*near/(far-near)],
                     [0,               0,             -1,                     0]])

fov = 60
width, height = 860, 640
near, far = 1, 1000
K = camera_matrix(fov, width, height)
proj = projection_matrix(K, width, height, near, far)
eye = [0, 0, 60]
target = [0, 0, 0]
up = [0, 1, 0]
view = pyrr.matrix44.create_look_at(eye, target, up)

The resulting matrices:

# camera matrix
array([[744.78185,   0.     , 430.     ],
       [  0.     , 744.78185, 320.     ],
       [  0.     ,   0.     ,   1.     ]])

# projection matrix
array([[ 1.73205, -0.     ,  0.     ,  0.     ],
       [ 0.     ,  2.32744,  0.     ,  0.     ],
       [ 0.     ,  0.     , -1.002  , -2.002  ],
       [ 0.     ,  0.     , -1.     ,  0.     ]])

# view matrix
array([[  1.,   0.,  -0.,   0.],
       [ -0.,   1.,  -0.,   0.],
       [  0.,   0.,   1.,   0.],
       [ -0.,  -0., -60.,   1.]])

Model points:

# x y z w
# four non-coplanar points
gl_points = np.array([
    [-25.0, 50.0, -10.0, 1.0],
    [-50.0, -50.0, 0.0, 1.0],
    [50.0, -50.0, 0.0, 1.0],
    [25.0, 25.0, 10.0, 1.0],
]) 

Projected points with defined matrices:

viewproj_points = np.array([proj.T.dot(view.T.dot(p)) for p in gl_points])
ndc_points = np.array([p / p[3] for p in viewproj_points])
viewport_points = np.array([(p + 1.0)/2.0 * np.array([width, height, 1.0, 1.0]) for p in ndc_points])
viewport_points[:, 1:2] = height - viewport_points[:, 1:2]

# result:
array([[297.13624,  54.27248,   0.74668,   1.     ],
       [119.98456, 630.01544,   0.74609,   1.     ],
       [740.01544, 630.01544,   0.74609,   1.     ],
       [616.00927, 133.99073,   0.74526,   1.     ]])

View matrix estimation:

object_points = np.array(gl_points[:, :3], dtype='float64').reshape(4, 3, 1)
image_points = np.array(viewport_points[:, :2], dtype='float64').reshape(4, 2, 1)

# using AP3P as 4 points are given
_, rvec, tvec = cv2.solvePnP(object_points,image_points, K, None, None, None, False, cv2.SOLVEPNP_AP3P)

# transform rotation and translation vectors to openGL view matrix
tvec[1] *= -1.0 # -y
tvec[2] *= -1.0 # -z

pnp_view = np.zeros((4,4), dtype='float64')
pnp_rot = cv2.Rodrigues(rvec)[0]
pnp_rot[1] *= -1.0 # -y
pnp_rot[2] *= -1.0 # -z

pnp_view[:3, :3] = pnp_rot.T
pnp_view[3, :3] = tvec.reshape(3)
pnp_view[3, 3] = 1.0

# result:
 array([[   0.99993,   -0.00443,   -0.01064,    0.     ],
        [   0.0036 ,    0.99705,   -0.07669,    0.     ],
        [   0.01095,    0.07664,    0.997  ,    0.     ],
        [   0.40153,   -0.14424, -123.9464 ,    1.     ]])) # more than double z-axis translation!

The reprojected points using the same process and the resulting absolute error:

# reprojected points with estimated view matrix
array([[363.63455, 187.2691 ,   0.74843,   1.     ],
       [275.14729, 474.85271,   0.74816,   1.     ],
       [584.85271, 474.85271,   0.74818,   1.     ],
       [511.98591, 238.53208,   0.7481 ,   1.     ]])

# absolute error
array([[ 66.49831, 132.99663,   0.00175,   0.     ],
       [155.16273, 155.16273,   0.00207,   0.     ],
       [155.16273, 155.16273,   0.00209,   0.     ],
       [104.02335, 104.54134,   0.00284,   0.     ]])

Since I’m providing “perfect” image points, one would assume the reprojection error would be very low (<1 pixel). Therefore, I suppose I’m making a mistake somewhere in the transformation process. Any insight on what could I be doing wrong?

Thanks in advance.