Perhaps my biggest problem is knowing what to call what I’m looking for.
I’m helping my 7th grader with a science project. He wants to focus on optimizing autonomous robot navigation, working in Python. I offered to write visual recognition functions he can import and use. We’re printing images on 8.5x11" paper that we pin onto foamboard “walls” that he can set up like a maze.
I have code based on examples, tutorials, and OpenCV documentation, that:
- does OCR
- detects a grid pattern and returns distance from, and angle relative to, a wall
The problem I have is my code getting “confused” by ambient/background “garbage”.
So I thought I’d print some markers on each wall that are highly distinctive so I can start by cropping the image. I’m printing a red square in the upper left and a green square in the lower right. This works great if the lighting is consistent, the distance from camera (standard Pi cam) is consistent, and no background objects “impersonate” my red or green square.
I expect this sort of thing is already addressed and there is some standard “idiom” and I don’t want to reinvent that wheel, which could take a lot of trial and error. So I’d greatly appreciate finding out what the answer is, or, at least, what to call it so I can search for it!
I think I may have made a start toward answering my own question. After writing up my question, I tried adding “marker” to my searches, and this led me to ArUco markers. I also learned what I was doing with the grid is possible with ArUco markers and is called “pose estimation”.
If you want to do pose estimation, have a look at the OpenCV documentation (select the appropriate version you have at the top of the page):
First thing is to calibrate (estimate the focal length, principal point, distortion coefficient) your camera. Then, using a known object (here an ArUco marker you know the dimension), you can estimate the full 6D pose (3D translation, 3D orientation) between this object and your calibrated camera.
You can try this site to print your marker (
5x5 dictionary should be good). I am used to another kind of marker (AprilTag) but this site looks good to me. Or maybe you can use directly this pdf.
Please note that you need to see fully the marker to be able to detect it and to be able to estimate its pose. And no occlusion also.
Thanks a lot for the cookbook-like post! You just connected a lot of the dots and saved me poring over stuff about camera calibration, 6D translation/orientation, and more, just to learn what the terms even mean.
I saw the opencv docs for generating PNG images for the ArUco markers – cv::aruco::drawMarker(). I feel pretty comfortable merging that into our current print-outs we’re developing.
We won’t need a lot of IDs, so I assume the smallest, 4x4 markers are the ones to use.
4x4 would be a good choice. There is a 3x3 option which would provide plenty of IDs for your use case, but you are more likely to get false positive detections with the smaller size (in my experience). The 4x4 does take up more space, but should be more robust.