I don't understand solvePnP's rotation and translation vectors' positive/negative signs

Hi everyone. I’m using an aruco marker and detecting its four corners with the aruco library.
Then I’m using those values to calculate the position of the camera by using solvePnP.

The objPoints (or real coordinates) that I’ve assigned to the marker corners are, starting at the upper left corner clock-wise:

0 0 0
6 0 0
6 6 0
0 6 0

This is the order in which aruco’s corners are calculated, so each image point has its corresponding real point correctly when passed to solvePnP.

Looking at this, you can see that the upper corners are at height (Y) 0, whereas the lower corners are at height positive 6. You would expect that if I put the camera above the marker, it would output a negative Y coordinate.

But it doesn’t. It outputs a positive Y as I move the camera upwards. If I change the real Y coordinates to negative -6, then the output is negative Y coords as I move up (and when I do this, it also outputs an X-axis rotation of -180º instead of 0º).

I’ve noticed that when aruco calculates the corners, it represents the top left corner of the screen as (0,0). My only guess is that when solvePnP receives the image points, it assumes that (0,0) is the bottom left corner instead of the top left, and as a result it is effectively vertically flipping the image coordinates calculated by aruco.

I’ve looked for documentation to confirm this theory, but found nothing. Can anyone confirm if this is correct, or if I’m misinterpreting something about solvePnP?
Edit: This is wrong, see replies.

could it be that the transformation you receive is supposed to convert from marker frame to camera frame?

then a positive Y would mean the marker is moving into the bottom half of the camera’s frame/view.

point the camera directly at the marker’s origin, but angle the camera so it’s not above the origin. what values do you get?

the screen coordinate system in OpenCV is usually right-handed and x right, y down, z away. I don’t know if aruco does anything differently.

it would be immensely helpful if you showed some pictures that show where the marker is and where your camera is, and what values you get for that pose… and the same for another pose.

I have noticed that the X coordinate is also inverted, not just the Y one, so the little logic I saw in my original theory is gone.

Here is a sample:

The camera is to the left and above the marker, and it’s giving me positive coordinates instead of negative ones.

I should note that I’m applying an operation to the translation vector so that its coordinates are relative to the marker’s orientation instead of the camera’s. The math behind it is just a couple of Pitagoras and Tales theorem application, and the result is thus:

X’ = X * cos(Y_angle) - Z * sin(Y_angle)
Y’ = Y * cos(X_angle) - Z * sin(X_angle)
Z’ = Z * cos(X_angle) * cos(Y-angle) + X * sin(Y_angle) + Y * sin(X_angle)

(Another note is that solvePnP’s rotation output X-angles are positive when the camera is looking downwards, whereas the Y-angles are negative when looking to the right. This means that in the program I have to invert the X-angle for the operation to work, but I don’t know why that is the case)

This, however, has no impact in the results when the rotation vector is all 0 degrees. Rotation isn’t 0º in the sample I uploaded, but the coordinates stay consistent regardless of rotation; and the coordinates of 0º rotation are consistent with coordinates also at 0º without applying this operation.


I have open today a similar thread.

And today another user has opened another thread with questions regarding how to obtain the camera pose from the rotation and translation vectors.

So it seems many of us struggles to get this right…

In the meantime I have found sources that says the angles from this function are actually in axis-angle notation. Have you take this into account?

Also, maybe help to try use the cv2.Rodrigues() method to obtain a 3x3 rotation matrix from the 1x3 rotation vector.

Hope it can help :slight_smile:

Yes, I know that it’s axis-angle notation, it’s the signs that I don’t understand.

As another person correctly said earlier, openCV uses positive signs for right, down and away.

The translation vector coordinates from solvePnP is fully inverted, which makes a bit of sense? I still don’t fully understand it.

However I don’t know why in the rotation vector, the X-axis angle is NOT inverted. Rotating the camera upwards results in negative angles, and moving the camera upwards result in positive Y coordinate. That is not consistent with the horizontal values: rotating the camera to the right results in negative Y-axis angles AND moving the camera to the right results in negative X coordinates.

you talk about negative angles… but you don’t have those. you have an rvec, which is a vector and its length. the length is always positive and the vector is the axis of rotation. if any of its components is negative, that’s just how it lies in space.

you haven’t stated whether you convert to euler angles. that’s the only way for you to have negative angles. DO NOT do that. it’s not a suggestion. it’s pointing out that what you’re saying doesn’t make sense to me.

I literary rotate the rvect first parameter around the right axis in rad (value out of solvePnp is in rad), then I rotate the second value of the vector around the no any more looking down Y axis, because the previous rotation has rotate the down Y axis, so I just rotate that ammmoutn wherever the Y axis is looking to, and finally the same for the third value with Z.

And I get a perfectly rotate vector that no doub is well oriented considering where the camera was place when I took the photo.

Then I apply using this new and rotated axis, displacements on XYZ with the tvect, and magic happens, I got a very precise camera Pose.

The best test for this is to place two chessboard in your camera view. Make 2 photos, without moving neither camera neither chessboard. But cover one biard for the first photo, and cover the other board for the second photo.

Now when you do this operation with each rvec and tvect, you will see that the resulting 3D pose of you camera march for both! :slight_smile: even being so different rvect and tvect cause both boards where place in different positions, but the camera was static.