Hi,

I am trying to develop a Monocular SLAM in simplest form with a single thread.

To reach the goal I divided the process into some sub goals as follows

a) Monocular VO with 2D-3D correspondence(aka PnP)

b) Bundle Adjustment/Graph optimization to optimize the calculated 3d sparse map and existing calculated poses.

Now in section (a) I am stuck.Here what I tried to do is,

i) For the initial two frames I tried to find out relative motion between them using Essential Matrix(5-Pts)/Homography Matrix (one of these which suits the visible scene by finding E & H score).

ii) Next triangulate 3D points based on the Transformation calculated on the previous step. Filter out those points which are having Z negative and Z larger than 500 units.

iii) Add newly found 3D points to the Map and save 2D-3D correspondence for the current frame.

iv) In the next frame we try to find ORB matches with the last frame.

v) With those matches which are already having 3D correspondence among the current matches from the last points we are doing SolvePnP using OpenCV. To get the Transformation from the origin.

vi) And those matches which do not have any 3D correspondence are processed as step (ii) and (iii).

vii) Again repeat from (iv).

Above is the workflow I thought to keep track of camera ego motion and environment too.

Till now I have not done any BA or Graph Optimization.

Case 1:

Now let’s talk about the problem that I am currently facing is, from step (iv) if I cumulate the Rotation and Translation like this for camera trajectory like following,

tc = tc + (-Rc * t);//tc cumulative translation, Rc cumulative rotation, t, translation from SolvePnP.

Rc = Rc * R; // R, Rotation is the Rodriguez of rvec from SolvePnP.

So I am getting like above…Which is completely wrong you can see that on the attached video.

Case 2:

And also if I use the R(Rodriguez of rvec) and t to find camera trajectory then what I am getting initially is good(?) but after a few frames it gets screwed up. Also I have noticed as frames pass the translation between each frame is gradually decreasing. As you can see on the following screenshot the camera marker is getting closer. And here the trajectory is reversed means the camera is moving backward. I know it can be solved using inverting the Translation Matrix(4*4). Initially the trajectory starts from (0,0,0) then after some frames it gets splitted into different trajectories.

** One thing to note is that, for both of the cases, I am using t and R(Rodriguez of rvec) directly as current frame translation and rotation respectively, to derive "current 34 Projection Matrix" in current round and to derive "last 3*4 Projection Matrix" in next round, in order to triangulate 3D points. Here I could be wrong I am not sure.

Can you please help me sort out the issue? I think I am close to it but somehow not able to find the correct solution.

–

Thanks & Regards

Smithangshu Ghosh