Hello. I am trying to write a robot localization code using aruco markers. I placed some markers on the ceiling and I wrote a code which inverts tvecs. Sum these vectors with the real world coordinates of the markers. In general code works with a huge error and a lot of outliers. I could not figure it out why these things happen. Calibration is done. What might be the cause of this and how can i prevent this to happen this outliers especially. I did not throw you the huge code because no one is going to look at it actually. If you want i can but anyways. Here is a video of my current localization progress. aruco hatalı okuma - YouTube

the smaller the marker, the harder its orientation is to estimate. there is ambiguity at near-orthogonal (small/far away) views. you can see some of their orientations flip-flopping around.

make them as large as possible to make this more robust.

if you don’t really need to, you should avoid using their orientations (rvecs) entirely.

you can define an “aruco board”, which describes an arbitrary constellation of markers in 3D. then, the pose estimation can incorporate all the markers it sees. that will hopefully result in a more robust pose for the entire constellation.

Hey, thank you for the reply. I forgot that i asked this in this forum. What you have said makes sense. I have already noticed that the flip-flops on the markers but had no idea how to correct them. The question is how can i do this without using rvecs.

I have a set of tvecs which are positions of markers with respect to camera coordinate frame. Also i have another set of vectors which are the positions of the same markers in real world coordinates. So i think i need to find a way to transform the reference frames. I have searched on the internet and i have found this. transformation - How to solve an overdetermined system of point mappings via rotation and translation - Mathematics Stack Exchange

This will give me a rotation matrix and a translation vector. between these coordinate frames. So basically this translation vector should be what i was looking for. But the results are not what i am looking for. Here is a figure for localization of the same video.

Well since i could not do this by myself, do you know some built in functions or sample codes to do this? Thank you again and sorry for the late notice.

so you have:

- model positions of all markers, in world frame
- measured positions of all markers, in camera frame

and you want a transformation relating camera frame and world frame.

that’s ideally an euclidean/rigid (r+t) transform. its estimation is a fundamental geometry task but usually it’s simple.

Iterative Closest Point is a well known algorithm, if the point-point assignment is *unknown*. since it is *known* in your case, there is probably a non-iterative (i.e. closed form) solution.

measurements are tricky to debug because they’re noisy and dirty.

I’d recommend prototyping this synthetically, i.e. define your world model, define a camera pose (identity first, then introduce slight rotation, slight translation, …), transform the markers into camera space, so now you know what everything is supposed to be, **and then** see if your implementation can give you the camera pose from marker poses in both frames.