Project points to a second camera

I have a body tracking software that gives me the pose of a skeleton (from a kinect) in world coordinates. The coordinate system is X right, Y up and Z towards the camera.
The BT software provides me the camera position (0, 0.62, 2.5), as well as its orientation (yaw, pitch, roll).

The character/skeleton is in world coordinates (provides the position of the Hips, which is the root of the skeleton, and then the rotations of each bone).
I have a 2nd camera, a webcam, on top of the kinect, which I calibrated with the kinect IR camera, using OpenCV (stereoCalibrate), so I have the intrinsics of the webcam and the transformation between kinect-webcam: a translation of (0, 0.05, 0). There’s a small X-axis rotation of 9º (or -9º, I don’t remember).

I want to be able to project the skeleton into the webcam image. What I have is:

  • Skeleton in world coordinates (right-handed, X right, Y up, Z towards camera)
  • Webcam intrinsics
  • Transformation from kinect to webcam (translation and rotation)
  • Pose of the kinect in world coordinates (so I should be able to compute webcam pose in world coordinates with this and previous info)

To project the 3D to 2D points I use the OpenCV’s projectPoints, which gives me the image coordinates, given the 3d points, the intrinsics of the camera and a transformation.
I compute this transformation using the pose of the camera that the body tracking software gives me. I can edit the values (translation + yaw/pitch/roll) ​​to adjust them visually, that is, interactively move the camera until the projected skeleton fits the person and see the values ​​translation+rotation it should have.

The problems I have are:

  • If I set the projectPoints transform using the values obtained from the BT software, the projected 2d points do not fit well, even if I add (just summing) the translation and YPR offset between kinect-webcam.
  • I have to add a 180 degree X rotation of the camera or invert YZ because it seems that in OpenCV the Y goes down and the Z goes out of the camera. Maybe this is messing up somehow.
  • The Y position of the camera has to be much higher than what the camera says, even after adding the offset between the kinect and the webcam. In some places I have seen that you have to transform the pose of the camera doing tvec = -Rinv * tvec, where R = Rodrigues(rvec), that is, to reverse the rotation of the camera and to apply it to the transfer of the camera and to deny it.
  • In fact, what you provide to projectPoints function is the transformation matrix that maps 3D coordinates of the object (world) to the 3D space of the camera, but what I have is the pose of the camera in world coordinates.

Basically, with the information I have, I don’t know what parameters to pass to the OpenCV projectPoints function to give me the correct points.

welcome.

you seem to have a good understanding of the situation. I guess the issue is in the details.

yes, opencv’s assumption is x right, y down (screen/camera), z out.

I’d recommend representing all transformations between coordinate frames as 4x4 matrices. build them from rvecs and tvecs if you have them. don’t use those on their own if you don’t have to. if you have to, pull them out of a 4x4 matrix (Rodrigues function does rvec/matrix conversion).

those 4x4 matrices compose by matrix multiplication. the inverse of such a transformation is… the matrix inverse.

maybe keep a sketch of your coordinate frames and what matrices you have and which way they go. bad/no naming conventions is how people typically get in trouble with this stuff.

I’d recommend deriving all naming from the usual matrix notation, putting frame names on the right and left side. example: a robot-to-camera matrix could be denoted ᶜMᵣ or cam_robot in code, which represents the robot’s pose in the camera frame, or equivalently maps points from the robot frame into the camera frame.

similar discussion: Need help with Matrix calculations - dynamic grasp point for robot - #14 by crackwitz

Hi!

Thanks for your quick response. I still don’t understand what do the rvec/tvec parameters represent for projectPoints. Are those the pose of the camera in world coordinates? Or the translation/rotation part of the world-to-camera transform (if that is any different of the camera pose)?.

Also, when I plug in those parameters, I see the model upside down, so either I have to provide the position of the camera negated or I have to negate the skeleton joint positions (the points I want to project).

Hi,

I followed your advice and drawn the world-to-camera matrix by chaining all the transforms. I included also the “invert Y and Z” transform to go from world coords to opencv camera convention.
However, the final position is only “approximately right”, that is, I see the skeleton more or less where it should be, but not exactly in place.

I have parametrised the tvec in tx,ty,tz and the rvec with the euler angles (XYZ), so I can interactively adjust them to make the skeleton fit. What I see is that the resulting manual calibration is good only for some range of skeleton locations, so it makes me think maybe the intrinsic calibration of the webcam is not right.
When I try to calibrate it again using the checkerboard, with some sets of images I see that the resulting calibration gives low RMS reprojection error, but the Z coord of the tvec I know for sure is wrong, so I don’t know what to think.