Computer vision is wildly wide. As a blind guide I can point out some basics:
Python first, C++ last
Try you algorithm in Python. It won’t perform as fast as in C++. Leave the performance issue for later, try, test, rewrite in Python until you are sure that’s what you need. Then port it to C++.
Do tutorials and play with demos
Computer vision solutions is often a complex system, meaning they combine a sequence of functions from different technologies. The main general conclusion about complex system:
- you won’t (low chance) find them already done, ready to copy and paste
- they won’t work by simply copy and paste, you need to understand each step
- you need to be comfortable with every step, understanding the functions and their arguments
So, I advise to try each function appart changing arguments in a demo.
Find the technology set that suit your problem
AR is a good starting point. Mark based or markless? You can go through ARUCo markers if you are allowed to use markers.
https://docs.opencv.org/master/d5/dae/tutorial_aruco_detection.html