Calculating Camera Pose and Orientation Using 3D-2D Point Correspondences

If you want to understand the theory, you could read through chapters 2-6 in Hartley/Zisserman “Multiple View Geometry” - if you fully understand all of that material you will be in great shape. I can’t say I understand it fully, but I get along pretty well with half of it.

I would suggest just starting with what you have, and feeding it into solvePnP. Working with the results, project your world points to the image, and draw some circles (both where your original image points were and the projected points are). Look for the point with the largest discrepancy, and eliminate it from your correspondence list and try again. Observe how things changed. I’m assuming you have a lot of points and are able to discard some. If you only have 4-8, you might not be able to discard too many of them, so you should focus on getting high quality ground truth values and really good image locations for your correspondences.

If you do have a large number of points to start with, you might well benefit from filtering outliers and refitting. Or you might want to look into solvePnPRansac for a method to automate the process of eliminating outliers.

Pitfalls / things to avoid?
Make sure you have good intrinsics, and be certain the physical camera is stable (physically lock the lens in focus and zoom level if possible - auto focus or zoom should be avoided if possible.

Make sure you have enough points - I’d probably want to have at least 15 points to start with and 10 or more after filtering outliers, but it really depends on the situation. If your point correspondences are very accurate / high quality, you can get great results with 6 or 8.

Be aware that lower reprojection score doesn’t necessarily mean better results - you can discard “outliers” iteratively until you drive your reprojection error to zero, but the actual accuracy will likely be worse than if you used more points.

Be aware that solvePnP will give you the object’s pose in the camera frame, so if you want the camera pose in the object frame you will have to compute that.

Specific aspects regarding the quality of 3D->2D correspondences? Your requirements for the resulting accuracy will dictate how good your correspondences have to be. There are many factors that contribute to how accurate your correspondences are, so it’s hard to give helpful input without knowing more about your physical setup. As a general approach I’d suggest that you need to be confident in your 3D world points (which isn’t always easy), and more points help get better results when you have measurement error in the ground truth points. Image points can contain error from a number of sources - familiarize yourself with the image formation process to be on the lookout for various issues like chromatic aberration, the effects of Bayer filters and demosaicing algorithms, lens distortion etc. These can all contribute to error in the image points, but you can control some of these effects if you have the the ability to select the camera and optics you use.

It’s good to be aware of the chief ray angle of your sensor, and get a lens that is compatible.

Really there are so many things to consider, and it will ultimately come down to what you are able to control, and what level of accuracy you need. I can’t tell if you are working on a specific project, or are just trying to learn about the space more generally. If you have any details you can share, that might help with the feedback you get.