Camera positional tracking

the problem is called Simultaneous localization and mapping - Wikipedia

you should use a stereo camera for that because it already produces point clouds reliably.

if you absolutely have to use a single (=monocular) camera, you face a second problem: Structure from motion - Wikipedia