Using solvepnp with two arucos

Hello,
I am working with solvePnP to compute the relative pose of the camera wrt to two arucos layed on the same plane with known size and positions.

So, the coordinates of imagePoints are detected by function detectMarkers(), then given to solvePnP() function with the same order (starting from bottom left and turning CCW).
The coordinates of objectsPoints are known, and are given with the same order, as depicted in the figure.

image

I am then expecting a close to null/identity rotation with those given inputs, however, the rotation computed are far from null. First, i though that this dataset was facing ambiguity, and deciced to use solvePnPGeneric() to get all possible rvecs/tvecs, but the results are still inconsistent.
Then, i tried different resolution methods, and objectPoints orders (especially the explicitly defined SOLVEPNP_IPPE_SQUARE) without any success.

There is obviously a concept that eludes me in this problem, and any help would we welcomed.

Thanks,

Matt

that is a tiny tiny picture.

I’m afraid you’re working with pixel coordinates extracted from that tiny tiny picture. that is terrible data.

also: frontal views of markers will necessarily have large angular error. that’s a mathematical fact of life.

what pose are you trying to calculate? you didn’t quantify “far from null”.

is the pose between those two arucos fixed and a design parameter? if yes, you should throw all eight 2D corners and all eight 3D corners into solvepnp.

a MRE is required to debug your situation. usable details please. any responses you can think of are better as code and data, rather than prose. think about the true nature of what you’re trying to convey. is it code/data, does a picture convey the truest meaning, or is prose appropriate?

Right, here is the a more details description of the problem

void opencvExemple()
{
    std::cout << "solvePnP forum opencv \n";

    // input objectPoints
    std::vector<cv::Point3d> objectPoints;
    double marker_size = 0.04; // [m]
    double half_size = marker_size /2;
    double markers_relative_lateral_distance = 0.2; // [m]
    objectPoints.push_back(cv::Point3d(-markers_relative_lateral_distance              , -half_size,  1.0));
    objectPoints.push_back(cv::Point3d(-markers_relative_lateral_distance + marker_size, -half_size,  1.0));
    objectPoints.push_back(cv::Point3d(-markers_relative_lateral_distance + marker_size, half_size ,  1.0));
    objectPoints.push_back(cv::Point3d(-markers_relative_lateral_distance              , half_size ,  1.0));
    objectPoints.push_back(cv::Point3d( markers_relative_lateral_distance - marker_size, -half_size,  1.0));
    objectPoints.push_back(cv::Point3d( markers_relative_lateral_distance              , -half_size,  1.0));
    objectPoints.push_back(cv::Point3d( markers_relative_lateral_distance              , half_size ,  1.0));
    objectPoints.push_back(cv::Point3d( markers_relative_lateral_distance - marker_size, half_size ,  1.0));

    // input imagePoints
    std::vector<cv::Point2d> imagePoints;
    imagePoints.push_back(cv::Point2d(197, 202));
    imagePoints.push_back(cv::Point2d(212, 202));
    imagePoints.push_back(cv::Point2d(212, 218));
    imagePoints.push_back(cv::Point2d(197, 218));
    imagePoints.push_back(cv::Point2d(285, 204));
    imagePoints.push_back(cv::Point2d(301, 204));
    imagePoints.push_back(cv::Point2d(301, 220));
    imagePoints.push_back(cv::Point2d(285, 220));


    // intrinsic matrix
    cv::Mat cameraMatrix = (cv::Mat_<double>(3, 3) << 401.0513916015625,  0.0,            326.7859191894531,
                                                       0.0,                401.064453125, 206.62399291992188,
                                                       0.0,                0.0,           1.0);

    // clang-format on

    std::vector<cv::Mat> rvecs;
    std::vector<cv::Mat> tvecs;
    cv::solvePnPGeneric(objectPoints,
                        imagePoints,
                        cameraMatrix,
                        cv::Mat(),
                        rvecs,
                        tvecs,
                        false,
                        cv::SOLVEPNP_IPPE);

    displayMat(rvecs.front());
    displayMat(rvecs.back());
}

void displayMat(const cv::Mat &matrix)
{
    // Display matrix values
    std::cout << "Matrix (" << matrix.rows << "x" << matrix.cols << "):\n";
    for (int i = 0; i < matrix.rows; ++i) {
        for (int j = 0; j < matrix.cols; ++j) {
            std::cout << matrix.at<double>(i, j) << " ";
        }
        std::cout << std::endl;
    }
}

resulting rvecs are:

rvec = [0.00712227 , 0.724407 , 0.00441881];
norm(rvec) = 0.72rad = 41deg
rvec = [ -0.0290932 , 1.11246 ,0.0110151 ];
norm(rvec) = 1.11rad = 63deg

From what i understand, as objectPoints and imagePoints have the same layout and the same order, we can expect the rotation to be null or very close to zero (or eventually one axis close to pi/2, depending on how the axes are defined). So an rvec with values like:

rvec_expected = [~0.0, ~0.0, ~0.0];
rvec_expected = [~1.57, ~0.0, ~0.0];

could you detail more on this?

I could give a more complete MRE if needed, thank you for your support

Have you tried projecting the world points to your image using your camera matrix and the recovered pose? I often find that is helpful because the structure of the error can (sometimes) lead you to the source.

Also, have you printed the tvec, and do you have an expectation of what it should be?

My current best guess is that you have something wrong with your world points - something like the distance between the markers and the marker scale aren’t in agreement, or you have added an offset when you needed to subtract it (or not have it at all) etc.

Have you tried using just the points from one marker? What happens?

It looks to me like you are setting up your markers to be 2*lateral_distance - marker_size apart in X. This would imply a scale of 9:1 for maker size to X translation. In the image space your ratio of marker size to spacing is about 6:1 (I got 5.86). So that might be part of the problem. In trying to “force” the scale error out, it makes sense to me that SolvePnP would “use” a Y rotation (and the foreshortening in the X) to fit the data.

By my math, 0.137 might be closer to what you want for lateral_distance.

How did you come up with the numbers?

Once you get your scale factors figured out (or whatever the issue is), I think you might find that you need more accurate measurements for your aruco corners. Subpixel values or higher resolution images (or both), and possibly more arucos. If you want stable “close to null” estimates for your rotations, you should be prepared to put more thought/effort/changes into it.

Thank you for your inputs on this problem, you are right, there was an error in the value of lateral_distance. I fixed it, and give another try.
As you proposed, the objectsPoints have been projected with a null rotation and an approximate distance for the translation on Z axis.

    cv::Mat trueRvec = (cv::Mat_<double>(3, 1) << 0.0, 0.0, 0.0);
    cv::Mat trueTvec = (cv::Mat_<double>(3, 1) << 0.0, 0.0, 1.0);

Then, projected points are compared to pixels detected by detectMarkers()as show in the bottom figure. We can noticed the detected point are translated which is true, because both markers where not centered. The Y/X scale between both point arrays are consistent, idem for the objectPoints.

Up to this point, I then expected a null or close to zero calculated rotation from solvePnP(). However it is not the case :frowning:, the resulting rvec is:

rvec = [0.029359, -0.359715, -0.012249];

I still get unreliable result with a single aruco using SOLVEPNP_IPPE_SQUARE (2 times larger than the one previously shown), I also try to add Z depth on markers by tilting them:

Screenshot from 2024-01-17 11-46-17

that is because the picture resolution is tiny. there are mathematical limits to what’s possible. you can’t expect this to perform better. you need more image resolution. I believe already remarked on the tiny images.

You are right, i tried by multiplying the resolution by 9, and the result is much more acccurate and stable. Thank you for the support :slight_smile:

I hope that means actual sensor data, not just upsampling a bad picture.

Nope, it is real sensor data, the initial frame used was subsampled.