I would like to estimate the pose of the camera between frames in a video. I care more about the relative rotation between frames than the translation. Unfortunately, I am getting inconsistent results and I do not know where to start debugging. Perhaps you can help.
The process to recover the pose consists essentially in these steps.
- Camera calibration. I get an error of about 0.03, which I think is ok.
- Read two consecutive frames. I convert both to grayscale and undistort them with the camera parameters found with the calibration process
- Find SIFT features in each frame, match them with Flann method, and retain those features that are closer
- Find the essential matrix (
cv.findEssentialMat(...)
) and then recover the pose (cv.recoverPose(...)
).
What I expect
The estimated camera pose for these two frames (visually identical)
is an identity matrix for the rotation matrix R:
[[ 1. -0. -0.]
[ 0. 1. 0.]
[ 0. -0. 1.]]
and a translation vector [-0.58 0.58 -0.58]
What I do not expect
But when process two other frames (also visually identical to the previous ones, but that I cannot upload due to limitation to first time users), I get a rotation matrix R
[[-0.96 -0.25 -0.1 ]
[-0.25 0.71 0.66]
[-0.1 0.66 -0.75]]
and a translation vector [ 0.14 -0.92 -0.36]
Despite the frames being almost identical, sometimes the camera pose estimate is quite off. There is nothing inherently strange with these specific couple of frames I have attached. Every time I rerun the algo, some frames will be processed correctly and some others (almost identical) not so much. The features in all frames are dense and seems to match quite well.
Do you know what could cause this issue?
python 3.11.5
opencv-contrib-python 4.8.1.78