I canâ€™t really tell what you are showing in the pictures. Are the yellow points in the â€ś50% of the timeâ€ť image supposed to project to one of the visible markers, or are they a marker that is occluded?

The red points are my detected points on the image.

The yellow points are the reprojection of all the model points into the image. It is the estimated solution by my brute forced solvePnP.

When you say your results are not stable at all, do you mean that you will get significantly different pose estimations from one frame to the next (true erroneous results) or that there is a lot of jitter, but the results frame-to-frame are still reasonable estimates?

It is the former. I already have 10 points, and evaluate 4 of them, which is supposed to give a unique solution. I would not like to evaluate 5 points because this will be a rare case unless I put a tremendous amount of markers all around my object. It would also considerably increase the computation time.

My suggestion would be to call SolvePnP for your 4 point data sets, and then call solvePnPRansac on the full data set once you have determined your correspondences.

Well I have not noted a particular difference between SolvePnP and SolvePnPRansac on the final result, so I chose SolvePnPRansac since it is 2-3 times more optimized in my case (from 30ms to 85ms). I will use SolvePnP to be sure.

Also regarding performance: You donâ€™t have to search the full space each time. For example if you had 20 points it wouldnâ€™t be possible to brute force it, but you could apply your method and stop once your best reprojection error estimate had stabilized. Say do it for 20 iterations to start with, and then stop once your best reprojection error hasnâ€™t improved in 10 iterations. (There are certainly more sophisticated / robust / valid ways to do this, but something like this is easy and will get you started)

I did not get this part. There is only one solution, so I can only brute force all the configurations. What do you mean by iteration ?

I do not search the full space. As said I try all 4 3D points configurations with 4 chosen 2D points found on my image. I donâ€™t try with 10 points when I have 10 2D points, this would not be possible.

Ex:

I have 5 2D points on my image called a b c d e. I choose arbitrarily a b c d.

I have 10 3D points corresponding to my markers location called 0 1 2 3 4 5 6 7 8 9.

Now I try all 3D points configuration, so I try :

(a, b, c, d) - (0, 1, 2, 3)

(a, b, c, d) - (0, 1, 2, 4)

(a, b, c, d) - (0, 1, 2, 5)

(a, b, c, d) - (0, 1, 2, 6)

â€¦

In the end I evaluate the reprojection and find that for example (a, b, c, d) - (4, 7, 2, 0) is the best match.

However there are several best matches. I try to discriminate them by using the last unused point e and see if one match has a point where it corresponds well. This match is our solution.

I could do differently to find the best of all the best matches using solvePnPRansac with all my data points as you said instead of 4, however this can only work with more than 4 visible 2D points like my previous solution of using the last e point.