Inconsistent results with `recoverPose`

Some additional comments:

  1. Logitech C920 uses a voice coil focusing system, I believe. I have not had great luck with the stability of voice coil focus optics. If possible, I would suggest getting a camera that has a mechanical (and lockable) focus. If you must use that camera, at a minimum ensure that the focus is in manual/absolute mode. I use the V4L2 backend directly with my cameras and the relevant control is V4L2_CTL_FOCUS_ABSOLUTE - hopefully openCV supports this setting. Disable autofocus and set an absolute value that gives you acceptable focus and use that value always for all operations. The problem is that if you use autofocus (or varying absolute focus values) the focal length of the lens changes, and possibly the image center and distortion as well. Better yet, you pick your manual focus value that you like and then you carefully epoxy the lens to the housing so that it is physically restrained from moving. To be clear I’m not saying that this is why you are getting inconsistent results with your test, but I do think it will cause problems at some point if you don’t address it. (And maybe you are already aware of this and have dealt with it…)

  2. Your calibration images look OK to me, but I would want more points closer to the corners, especially if you intend to use the corner of the image for any of your calculations. I would suggest looking at the Charuco calibration process (Charuco = chessboard + aruco) The aruco markers allow for identifying individual chessboard corners when only part of the pattern is visible. This is helpful because you can get points closer to the edge / corners of your image since you don’t have to see the full target. The code is a little different, but there are tutorials available and it really isn’t that hard to manage. There are online Charuco target generators as well. (Again, I’m not saying that this is your current issue, but I do believe you will get better camera calibration if you do this - the score might be higher, but the actual accuracy will be better, particularly if you need accuracy at the edges / corners of your image.

  3. I can’t really comment on the way you are filtering the points, but as I understand it you end up with a set of image correspondences, and because the images are essentially the same it is trivial to validate your correspondences. The distance between image point 1 and 2 should be approximately zero for all of your correspondences, so if you find ones that aren’t, dig deeper. (Of course this test is only possible because your images are the same, and isn’t something you could apply once there is motion in your video sequence - the point is to use this test to identify the problem so you can fix it, not to rely on it as a way to filter points further.) When I encounter a situation like this I’m likely to compute the Euclidean distance between the image points, compute some basic statistics on the set (mean, standard deviation), and then flag correspondences that are, say, 2 or 3 standard deviations above the mean. I’d then draw those points on the image pair and inspect the image to see if I can make sense of what happened. But I’m getting ahead of myself.

I really don’t have much experience in this domain, but my gut says that you are getting “good” matches (the descriptors match well) from scene points that don’t match well at all, and this is polluting your data and giving bad results. The surface that your robot is standing on - maybe it’s carpet or something similar - looks to me like it would not provide very good / uniquely identifiable features, but rather would provide a lot of opportunity for matching one noisy / unstructured area with some other noisy / unstructured area, especially when you add image noise into the mix. I don’t think I’d want to rely on any of those points that come from the “floor” area, especially once you start moving the camera around and seeing them from different views.

Closing thought
How did you choose 0.6 as your “close enough” value? Have you tried something more restrictive, say 0.1? Maybe consider varying this value so that you are only keeping the best 30 matches?

(the video you posted is not publicly accessible)