CAD Model 3D Pose Estimation

I’m looking to see if there any other resources for Real-time 3D pose estimation other than: OpenCV: Real Time pose estimation of a textured object. I’m having a couple issues and would appreciate any info on the topic.

The first part is the sample linked above uses a program to create the yml with the 3D points and descriptors. It mentions that the sample only handles planar objects and to “use a sophisticated software” to generate other 3d textured models but how does one achieve that? Does that involve using a different software that is already available or modifying the code given to be able to handle non-planar objects?

The second part of the question is: Azure Object Anchors (Azure Object Anchors – Mixed Reality Understanding | Microsoft Azure) which recently went into preview facilitates a similar workflow. The main difference is that the Azure implementation doesn’t require “training” images like the one linked above. Is it possible to achieve real-time 3D pose estimation without using “input” images with OpenCV? How could I extract ORB features and descriptors without having “training” images which I use to calculate 3D coordinates from?

Any help with this is appreciated.

if you have a cad model inside a cad program, why do you need to estimate the 3d pose from a 2d image ?
please explain, what you have, and what you try to achieve with it.

solvePnPXXX() expects 2 sets of corresponding 2d / 3d points.
this is fairly easy for e.g. faces, where you can use landmarks (e.g. dlib’s) which retain their order and 3d correpondance, or even with aruco markers

the opencv sample above demonstrates, how to generate correspondances from detected orb keypoints. far more advanced / complicated

that’s mainly a limitation of the registration algorithm used there. it has to sample (virtual !) 3d points from a sparse box model (only corners are given), using some ray tracing.
if you had a heightfield instead, you could simply sample z for given x,y …

no idea, what azure does there. ICP ? a 3d pose cnn ? they never tell, what’s under the hood, but for sure it is not trying to estimate 3d from a 2d image.

again, we have no idea, what problem you try to solve, please explain better !

@berak Thanks for the response and sorry for the lack of detail. I’m looking to achieve essentially what Azure Object Anchors can, it doesn’t necessarily have to be exactly as they do it but their process is:

  1. Upload CAD model to Azure
  2. It “trains” using the CAD model only and exports out a .ou file which is the “model”
  3. The model is used on device (Hololens) for object detection with pose estimation using the live camera feed

The biggest part that differentiates this from the sample I linked is the “training” with input images as you said they are not trying to estimate from a 2d image. I’m looking to see if anyone has any experience or ideas on how to achieve something similar to Azure Object Anchors.