Calculating Camera Pose and Orientation Using 3D-2D Point Correspondences

Hello Community,

I am currently working on a project involving computer vision and have encountered a challenge for which I seek your insights and assistance.

Background: In my scenario, I have a calibrated camera for which I possess the intrinsic parameters, encapsulated within the camera matrix. Additionally, through my setup, I have collected a set of 3D points from the real world and their corresponding 2D projections captured in an image through this camera. My objective is to accurately determine the camera’s pose and orientation relative to the observed scene or object.

Problem Statement: Despite having the camera matrix and the 3D-2D point correspondences, I am grappling with the methodology to effectively calculate the camera’s pose, specifically its position and orientation (rotation) in the world coordinate system.

Specific Needs:

  1. An understanding of the theoretical approach behind calculating the pose. Is this what is referred to as solving the PnP (Perspective-n-Point) problem?
  2. Any robust algorithm suggestions or best practices that are typically employed in the industry for such calculations. I have come across methods like EPnP, DLT, and iterative optimization techniques, but I am unsure of their applicability or efficiency in this context.
  3. Practical examples or pseudo-code to illustrate the process. While I understand the theoretical aspect may be complex, having a practical, code-oriented guide would significantly ease the implementation phase.
  4. Common pitfalls or errors that I should be vigilant about during the implementation. Are there any specific aspects regarding the quality of the 3D-2D correspondences, the number of points required, or certain conditions that might skew the results?
1 Like

If you want to understand the theory, you could read through chapters 2-6 in Hartley/Zisserman “Multiple View Geometry” - if you fully understand all of that material you will be in great shape. I can’t say I understand it fully, but I get along pretty well with half of it.

I would suggest just starting with what you have, and feeding it into solvePnP. Working with the results, project your world points to the image, and draw some circles (both where your original image points were and the projected points are). Look for the point with the largest discrepancy, and eliminate it from your correspondence list and try again. Observe how things changed. I’m assuming you have a lot of points and are able to discard some. If you only have 4-8, you might not be able to discard too many of them, so you should focus on getting high quality ground truth values and really good image locations for your correspondences.

If you do have a large number of points to start with, you might well benefit from filtering outliers and refitting. Or you might want to look into solvePnPRansac for a method to automate the process of eliminating outliers.

Pitfalls / things to avoid?
Make sure you have good intrinsics, and be certain the physical camera is stable (physically lock the lens in focus and zoom level if possible - auto focus or zoom should be avoided if possible.

Make sure you have enough points - I’d probably want to have at least 15 points to start with and 10 or more after filtering outliers, but it really depends on the situation. If your point correspondences are very accurate / high quality, you can get great results with 6 or 8.

Be aware that lower reprojection score doesn’t necessarily mean better results - you can discard “outliers” iteratively until you drive your reprojection error to zero, but the actual accuracy will likely be worse than if you used more points.

Be aware that solvePnP will give you the object’s pose in the camera frame, so if you want the camera pose in the object frame you will have to compute that.

Specific aspects regarding the quality of 3D->2D correspondences? Your requirements for the resulting accuracy will dictate how good your correspondences have to be. There are many factors that contribute to how accurate your correspondences are, so it’s hard to give helpful input without knowing more about your physical setup. As a general approach I’d suggest that you need to be confident in your 3D world points (which isn’t always easy), and more points help get better results when you have measurement error in the ground truth points. Image points can contain error from a number of sources - familiarize yourself with the image formation process to be on the lookout for various issues like chromatic aberration, the effects of Bayer filters and demosaicing algorithms, lens distortion etc. These can all contribute to error in the image points, but you can control some of these effects if you have the the ability to select the camera and optics you use.

It’s good to be aware of the chief ray angle of your sensor, and get a lens that is compatible.

Really there are so many things to consider, and it will ultimately come down to what you are able to control, and what level of accuracy you need. I can’t tell if you are working on a specific project, or are just trying to learn about the space more generally. If you have any details you can share, that might help with the feedback you get.