Extract XYZ coordinates and create point cloud


I work with a stereo pair. I want to try to avoid obstacles for my robot. I have a distortion map that I got after calibrating a stereo pair. If I understand correctly, then to avoid obstacles it is necessary to get the xyz coordinates of the obstacles (I detect them separately) and also the point cloud. Actually, how can I get this data? I work in C ++.

If there is another solution to my problem, I will be glad to know. Thank you!

“disparity map”, which says for everything seen in one eye, how many pixels it’s shifted in the other eye.

use this to get a point cloud of (x,y,z) points in 3D space:


the points are arranged in a pixel grid but they’re true coordinates in space. for varying disparity of a pixel, not just z varies, but also x and y.

Thanks your answer!

It seems that I figured out how to get z, I attach the code. But is it correct? Also, if correct, how do you get x and y?

cv::Mat 3d_(disp.rows, disp.cols, CV_32FC3);
reprojectImageTo3D(disp, 3d_, Q, true);
int xmin = Left_nice.cols / 2 - 20, xmax = Left_nice.cols / 2 + 20, ymin = Left_nice.rows / 2 - 20, ymax = Left_nice.rows / 2 + 20;
cv::rectangle(Left_nice, cv::Point(xmin, ymin), cv::Point(xmax, ymax), Scalar(0, 0, 255));

3d_ = 3d_(Range(ymin, ymax), Range(xmin, xmax));
Mat z_roi(3d_.size(), CV_32FC1);
int from_to[] = {2, 0};
mixChannels(&3d_, 1, &z_roi, 1, from_to, 1);

cout << “Depth: " << mean(z_roi) << " mm” << endl;

the same, i.e. extract channels 0 and/or 1 instead of 2. you seem to understand the data.

I think mean(3d_) should give you a 3-element Scalar with the mean x, y, z values

I would expect x and y to be close to 0 because you picked a region around the optical center. the means should deviate a bit if the points aren’t on a plane parallel to the image plane.

I’ll try.

Is the distance calculation correctly implemented in my code? Also, what advice can you give about the obstacle avoidance point cloud?

looks sensible, however…

  • watch the quality of the data, especially the disparity map. imshow it (or the z-values of the point cloud), judge it against your expectations (does it look blurry, patchy, grainy, threadbare, …). consider mapping those values so near things (small z) are light and far things (large z) are dark.
  • measure your calibration pattern’s true square size (width, height). measure across multiple steps for more accuracy (say nominal 25mm squares, if you measure 149 mm across 6 steps, it’s really 24.83mm)
  • check distances with a tape measure or yard stick
  • hold the pattern still, manually trigger pictures. many cameras have rolling shutter, movement will cause distortion.
  • keep these pictures so you can re-run calibration for different values (such as swapping the measured width and height of your squares, in case that’s mixed up)
  • watch the various error measures (reprojection error, etc), try various things, watch the error measures improve/worsen

I’d have to see the situation. you decide what’s an obstacle, when to avoid it, and how to avoid it.

Thank you! As for the distance, I’ll get back to you after I check.

About the obstacles. I plan to detect obstacles (for example, a chair, plants) and if any object is recognized, then make a detour around this obstacle. At the moment, I cannot figure out how best to do this.

you can use the 3D data as a simple range sensor, like one of those cheap ultrasonic distance sensors. you’d know the distance but not the type of object.

for a simple robot that avoids any obstacles, that should be good. add some SLAM, point cloud fusion, structure from motion, … to generate a map.

if you want to determine the type of object from the point cloud, that’s more involved. I’d not do that, but do inference on the RGB data instead, because that has proven solutions (DNNs such as YOLO).