solvePNP sensitive to input


I plotted your data (top: 3D data without Z, bottom: 2D data) I picked x/y axis scales that were close to equal, so the relative spacing/distances is approximately correct.

First thing I noticed is that the Z value is 5 for all of your points except the first point (which is 0) is this correct?

Second thing I noticed is that your camera matrix looks to be made up (not calibrated) based on the 320.5 / 240.5 values. (BTW, I think the correct values would be 319.5, 239.5, but I’m not certain.) Since you are (presumably) using an guesstimated camera matrix, I’m suspicious of your focal length as well. This might not be enough to wreck things entirely, but maybe? Also you have no lens distortion - again, you might be able to get away with this if your lens truly doesn’t have much distortion, but if it is a high distortion lens you’ll want to calibrate it.

I also annotated the images with two points A and B. Supposedly the A in the first image (3D) projects to the point labeled A in the second image, and same thing for B. This doesn’t make sense to me, and the overall structure of the two data sets don’t really look like they are related by a perspective projection.

I suspect your data is out of order, so there is no good mapping from the 3D points to the 2D points using any camera matrix. For example, I see a clustering of 4 point pairs in the 3D data - I would expect to see a similar clustering in the 2D data as well, but I don’t.

Assuming this is right (that your point correspondences aren’t actually correspondences), fix that and try again - I bet it will be a lot better.

There’s also a bigger lesson: when you get results that don’t make sense, add code that annotates your images in a way that you can visually verify what is going on. For example, using the input image you used as a starting point:
For each detected image point:

  1. Draw a circle on the image where you detected a feature. Draw it large enough so that it doesn’t obscure the feature, but not huge, maybe a 5-10 pixel radius?
  2. Project the corresponding 3D point using the recovered camera pose and camera intrinsics. Draw a circle (different color) on the same image using the projected location. Draw a line connecting the two points - this line maintains the relationship between the two points (one is the detected point, the second is the projection of the corresponding 3D point using the calibration results)
  3. Optionally keep track of the distance between the detected and projected image points so you can calculate a reprojection value. (I thought this was done for you automatically with the solvePNP call, but maybe not. This number is super helpful because you can quickly look at it and know if you have good results or batshit crazy results. (you have the latter type, it appears, which usually points to a usage or data problem.)

This step has been so important for me in tracking down data errors just like what you are seeing. It’s also very helpful in showing other types of error - for example, once you have fixed your data issue, you can get a visual idea of how much error you are introducing by not calibrating the distortion.

Good luck.

Oh, in case it isn’t clear, the reason you are getting such different results with input data that is only slightly different is because your data is so broken that there isn’t any good solution, only a bunch of terrible ones. It ends up choosing the best of the worst. Slight changes in your input data can move you far distances in the parameter space to get you a very slightly different version of the “best of the worst” solution. If that makes sense.