Computer vision is wildly wide. As a blind guide I can point out some basics:
Python first, C++ last
Try you algorithm in Python.  It won’t perform as fast as in C++.  Leave the performance issue for later, try, test, rewrite in Python until you are sure that’s what you need.  Then port it to C++.
Do tutorials and play with demos
Computer vision solutions is often a complex system, meaning they combine a sequence of functions from different technologies.  The main general conclusion about complex system:
- you won’t (low chance) find them already done, ready to copy and paste
- they won’t work by simply copy and paste, you need to understand each step
- you need to be comfortable with every step, understanding the functions and their arguments
So, I advise to try each function appart changing arguments in a demo.
Find the technology set that suit your problem
AR is a good starting point.  Mark based or markless?  You can go through ARUCo markers if you are allowed to use markers.
https://docs.opencv.org/master/d5/dae/tutorial_aruco_detection.html