I briefly reviewed the section 9.6.2 Extraction of cameras from the essential matrix from Multiple View Geometry by Hartley / Zisserman. I was reminded that:
For a given essential matrix E…and first camera matrix P=[I | 0], there are four possible choices for the second camera matrix P’
Two solutions are simply the translation vector being negated, the other two are by a 180 degree rotation about the line joining the two camera centers.
I don’t know how OpenCV handles this ambiguity in the recoverPose() call, or whether this could be the source of your error, but it definitely raises some questions. I have to wonder if the whole approach is flawed, at least with two nearly identical images. No rotation or translation between the two cameras - is this a degenerate / unstable case?
I’m truly getting out of my depth here, and certainly there are others on the forum with a better understanding than I have. Maybe they will jump in to help.
A few thoughts:
- Have you undistorted the image points prior to calling findEssentialMatrix()? If not, try that.
- Have you tried to use an image pair that is not identical, but instead has movement between them? Say a simple translation, or rotation. Maybe take a sequence of images from one pose, and then move the camera and take a second sequence. Randomly pick an image from the first sequence and pair it with a random image from the second sequence. Are your results more stable?
- You ask if I know of other ways to recover the pose. I do, but I’m not sure they are what you are looking for. I am assuming you are only looking for camera motion in a relative sense - how does the camera pose in frame n compare to the camera pose in frame m. Most of the work I do involves calibrating the pose in an absolute sense - where is the camera in some reference frame that I care about. This takes 3D ground truth points and corresponding image points from the camera you are working with.
To summarize:
Make sure you are using undistorted image points. Try to understand the 4 solution ambiguity and whether it is contributing to your problem. Try to run your algorithm with images from two different views to avoid the possibility of a degenerate / unstable case. Maybe try a scene with more depth variation. Consider buying the book referenced above - if you read and understand everything in chapter 9, I’ll be asking you questions.