I have a setup where 2 mono-cameras (not the stereo camera) have the same region of view, and I have to detect a 3D pose of a certain object (let’s say, human).
So far what I have understood, I have to perform the following steps:
- Detect a 2D BBoxes of a person with YOLO
- Calculate detectors (keypoints) and descriptors of each camera’s image
- Associate Keypoint Correspondences within Bounding Boxes
- Calculate Essential Matrix which should give a transformation between two cameras
- The calibration of both cameras are possible.
- Perform a Triangulation or Bundle Adjustment (not sure here)
- Estimate 3D Pose of a person? (not sure yet)
I would be appreciated if somebody could verify if I follow the correct steps