Object Tracking from a video and Getting 3-D Coordinates of objects

Hello everyone,
I am trying to get 3D coordinates from a video feed as I am new to OpenCV I have the idea to get 2D coordinates X and Y but struggling to get 3D coordinates. Can somebody guide me on how can I get 3D coordinates of objects from a video feed? any suggestions and guidance are welcomed.

1 Like

can you explain, what you’re trying to achieve, the context ?

usually, you need a stereo camera (block matching), or multiple images (SFM), to estimate depth,
however, there are recent efforts to use cnns on a single image, which might work in certain situations (street traffic, indoor rooms) they were trained on

Thanks for a prompt reply. The problem is I have captured a video of objects from a 3D simulator and a stereo camera is a kind of hardware technique. I want to process video and get 3 coordinates of each object either by size or other way but I am not sure what should I use and need guidance in this regard.

IF you have 3d models of your objects AND corresponding 2d points, you could try with solvePNP() to find the 3d pose.

otherwise, sorry to say so, but there is no easy or straightforward solution.
imho, you have to do some research now.
start reading here (scroll down to the explanation section)

I am not getting the idea properly. Would you mind connecting and discussing the problem so I might get proper understanding. I cannot see any option here to directly contact you can you share your email or can we connect on any other platform?

that’s what this site is for, so let’s stay here !

does this involve the OpenCV AI Kit, which is a piece of hardware, or does it not? I’m asking because it’s categorized as such.

No, it’s not involved in OpenCV AI Kit, I just have a video that consists of multiple objects and I want to get x,y and z coordinates of the video.


Sadly, there is no function to do that yet.

But there may be some solution if you can get a stereo version of your video: the almost same video twice, but the second one with the virtual camera slightly moved to the right, let’s say 10 cm.