Hi everyone,
i’m trying to implement a SLAM package using opencv, but I have some problems with solvePnPRansac (and solvePnP too). I’m using opencv 3.4.14 and solvePnPRansac never finds any inliner, while solvePnP always returns wrong results.
I have an Intel Realsense depth camera (D435 Depth Camera D435 – Intel® RealSense™ Depth and Tracking Cameras) and i’m trying to reconstruct the pose of the camera. To do this, the workflow is:
- capture frame at time t-1
- capture frame at time t
- extract keypoints (set to 500, ORB), compute descriptors (ORB) and match them (with a robust matcher as in this example opencv/RobustMatcher.cpp at 3.4 · opencv/opencv · GitHub)
- reconstruct the 3d keypoints coordinates at time t-1 thanks to the depth image
- use the 3d keypoints at time t-1 and the 2d matching keypoints at time t to solve the PnP problem.
The code of my function is
bool poseEstimation3d2d(const Frame::Ptr ¤t_frame, const Frame::Ptr &last_frame, std::vector<cv::DMatch> &good_matches,
double depth_scale, int min_depth, int max_depth, const cv::Mat_<double> &K, const cv::Mat_<double> &D,
cv::Mat &t, cv::Mat &R)
{
std::vector<cv::Point3d> pts_3d;
std::vector<cv::Point2d> pts_2d;
//Foreach match it checks if the keypoint at time t-1 has valid depth, else discard it, then reconstruct the 3d pose of the keypoints at time t-1.
for (cv::DMatch &m : good_matches)
{
cv::Point2f kp = last_frame->keypoints[m.trainIdx].pt;
unsigned short d = last_frame->depth.ptr<unsigned short>(int(kp.y))[int(kp.x)];
if (d <= min_depth || d >= max_depth)
{
continue;
}
double depth = d * depth_scale;
cv::Point3d p3;
image_utils::point2dTo3d(kp, depth, K, p3);
pts_3d.emplace_back(p3);
pts_2d.emplace_back(current_frame->keypoints[m.queryIdx].pt);
}
//if it found more than 6 points
if (pts_3d.size() >= MIN_PNP_POINTS)
{
cv::Mat r, inliners;
cv::solvePnPRansac(pts_3d, pts_2d, K, D, r, t, false, 100, 8.0F, 0.99, inliners);
cv::Rodrigues(r, R);
return true;
}
return false;
}
where depth_scale is to transform depth measure from mm to meters (0.001), and min_depth and max_depth are the limits of the depth for the camera (110mm-10000mm, 11cm-10meters), K is the camera intrinsic matrix and D is the vector of distortion coefficients, MIN_PNP_POINTS = 6 and the function point2dTo3d is
void point2dTo3d(const cv::Point2f &point2, double depth, const cv::Mat_<double> &K, cv::Point3d &point3)
{
point3.x = (point2.x - K(0, 2)) * depth / K(0, 0); // (x - cx) * depth / fx
point3.y = (point2.y - K(1, 2)) * depth / K(1, 1); // (y - cy) * depth / fy
point3.z = depth;
}
The method solvePnPRansac always return false, with 0 inliners. I tried to set more keypoints to extract (2000), more iterations in Ransac (500), less confidence (0.95 and 0.8). In addition, t and R returned are totally wrong.
These are the values printed for the first three steps with the camera fixed
t=[-2.521665 -6.966232 71734.524631]
R = [0.463265 0.040670 -0.885286]
[-0.883486 0.099604 -0.457747]
[0.069562 0.994196 0.082074]
t=[ -8.086195 -22.291806 433778.025662]
R=[ -0.910810 0.067253 0.407311]
[ -0.387687 0.199697 -0.899900]
[ -0.141859 -0.977547 -0.155813]
t=[ -23.359800 -64.562378 1107890.188865]
R=[ 0.480159 0.164383 -0.861641]
[ -0.819301 0.434948 -0.373585]
[ 0.313358 0.885324 0.343523]
I tried solvePnP instead of solvePnPRansac but the values are similar and they change every iteration, with the camera fixed always in the same point. I tried also using float instead of double in K, D and the cv::Points, but the situation is the same.
I know that t and R are not the movement made by the camera in real world coordinates, as explained here Dissecting the Camera Matrix, Part 2: The Extrinsic Matrix ← but with a fixed camera i cannot understand why they change in every iteration and why solvePnPRansac never finds inliners.
Sorry for the long question, but i want to explain clearly the problem. Can anyone help me please?