solvePnP reprojection to image issue

Hello,

I’m currently trying to estimate the head pose from an image using OpenCV in python.

The image above shows the detected image points (blue) and a projection of the three coordinate axes (2.5 cm long) into the image using the projectPoints function. But even though it works, the values returned by solvePnP do not make sense.

If I do not use the projectPoints function, but instead do the transformation myself as according to the documentation , I get negative pixel coordinates.

This is the example code for the image points and used 3D points used which shows the issue:

import cv2
import numpy as np

ref_pts = np.array([[  0.43488255,  42.70063   , -20.705105  ],
       [  0.27277985,  26.4461    , -14.271904  ],
       [  0.13183136,  11.719142  ,  -4.9430103 ],
       [  0.        ,   0.        ,   0.        ],
       [-11.985651  , -11.618346  , -21.41968   ],
       [ -6.7659035 , -13.994063  , -18.108206  ],
       [  0.1727357 , -15.1853285 , -16.894157  ],
       [  7.1862555 , -14.026736  , -18.05195   ],
       [ 12.442382  , -11.639794  , -21.37848   ],
       [-45.87925   ,  36.742226  , -39.275238  ],
       [-18.638481  ,  33.563454  , -33.30136   ],
       [ 19.544973  ,  33.596954  , -33.56625   ],
       [ 46.47897   ,  36.47234   , -39.469     ],
       [-23.46772   , -33.72577   , -31.594542  ],
       [ 24.001362  , -33.768307  , -31.932142  ],
       [  0.7072544 , -77.33406   , -36.489365  ]], dtype=np.float32) / 1000.0


im_pts = np.array([[1017.,  463.],
       [1025.,  501.],
       [1032.,  539.],
       [1040.,  569.],
       [ 987.,  592.],
       [1010.,  592.],
       [1032.,  599.],
       [1055.,  592.],
       [1070.,  584.],
       [ 882.,  463.],
       [ 950.,  463.],
       [1070.,  456.],
       [1138.,  448.],
       [ 950.,  667.],
       [1115.,  644.],
       [1032.,  787.]])


cam = np.array([[1920.,    0.,  960.],
       [   0., 1920.,  540.],
       [   0.,    0.,    1.]])

coeffs = np.zeros((4, 1), dtype=np.float64)

ret, rot, pos = cv2.solvePnP(ref_pts, im_pts, cam, coeffs, flags=cv2.SOLVEPNP_ITERATIVE)
rmat, jac = cv2.Rodrigues(rot)

# Manual projection
T = np.eye(4)
T[:3,:3] = rmat
T[:3, 3] = pos.flatten()
proj = cam.dot(T.dot([0,0,0,1])[:3])

# projection using opencv
proj_cv, _ =  cv2.projectPoints((0, 0, 0), rot, pos, cam, coeffs)

np.set_printoptions(suppress=True)
print(f"Manual Projection: {proj}")
print(f"projectPoints: {proj_cv[0,0]}")

Which prints

Manual Projection: [-725.07525257 -399.45861329   -0.7002508 ]
projectPoints: [1035.45080729  570.45078031]

I can obviously use the projectPoints function in this case, but I need to understand the transformation as I want to use it further. I’m suspecting there might be some issue with the projected points landing behind the image plane (resulting in the negative values), but I’m unsure what might be causing it, or how it is detected / corrected in the projectPoints function.

As an aside, I’m also confused by the image on the documentation page linked above, as that shows a left-handed camera coordinate system with the y-axis going up, whereas opencv uses a right-handed coordinate system everywhere else.

Any help would be greatly appreciated.

Best Regards,
Mirko

Looks like it really is down to the solution returned by solvePnP. If I set the initial values to something sensible in front of the camera, I get the following results:

rot = np.array([0.0, 0, 0])
pos = np.array([0, 0, 0.6])
ret, rot, pos = cv2.solvePnP(ref_pts, im_pts, cam, coeffs, rot, pos, useExtrinsicGuess=True, flags=cv2.SOLVEPNP_ITERATIVE)

This gives

Manual Projection: [662.68565566 362.93005335   0.6393272 ]
projectPoints: [1036.53599046  567.67497396]

Which is the same if I normalize the first two components of the manual projection by the third (which I guess is necessary, but I didn’t figure out with the other issue still in play)

Is this expected behavior by solvePnP? I think I can work with this solution, but it prevents me from using other methods except for SOLVEPNP_ITERATIVE which cannot be initialized with a guess value.

are you sure your camera matrix is correct? fx=1920, for that picture? that picture is shot with a very large focal length. fx should be in a different order of magnitude.

anyway, I haven’t analyzed your situation beyond that, so this isn’t to say that would be an issue.

No, that camera matrix is definitely not correct for the camera of the picture, I simply used the image dimensions to create it for the demo code above. In my actual code I use a camera matrix from calibration and encounter the same issue. I chose the demo image above as I cannot share the actual image I’m working on.

I just tested a few different values for the camera matrix including some absurd values (50000) and the issue also stays the same, so I don’t think the camera matrix is the culprit here.