3d pose classification

Hi! I have a 3d task. In general, in real time, I receive video from a camera (not 3d, without any ability to recognize depth, etc.). When receiving a regular streaming video, I need to get the 3D coordinates of the subject and the classification. What is the best way to do this? everywhere I see that they do tracking on ready-made 3D datasets, and so on. Conditionally, if you generate obj files (for 3d, according to my dataset in 2d), how to apply this further? It turns out what to do if I have to have a custom dataset, i.e. I need to train a grid from scratch like yolo, but only for 3d. Are there any solutions in this area?

did you mean : ‘detection’ ?

where ? also, tracking is a different beast than 3d pose is.

unclear. what’s a ‘subject’ ? classification ?

how would that work ?

what ? a grid ? where did you get that idea ?

please, try to formulate a simple problem, and use less (or more concise) ‘buzzwords’