solvePnP rotational vector

So I am following this #project, the result video is at the end.

I want a way to extract the global 3d coordinates of the nose and the vector pointing away. I understand that we get the translational vector (which is the nose coordinates… I think) and the rotational vector (I think this is the camera angle?). I just want the two points of the vector pointing away from the face in two 3D points.

Can anyone help with this?

rvec and tvec represent…

  • a transformation from the object coordinate frame to the camera coordinate frame
  • the pose of the object, expressed in the camera frame

rvec is an axis-angle encoding.

rvec/tvec are same for ALL the points. the tip of the nose isnt any special here

So how do we solve for the end point of the vector if its length is constant but location isn’t?

express that point local to the “object” (face).

then use projectPoints on that data, along with rvec and tvec, and out comes the screen-space point for that 3D point.

if you want to explore what’s going on… use the following combined with cv.perspectiveTransform()

def rtvec_to_matrix(rvec, tvec):
	"""
	Convert rotation vector and translation vector to 4x4 matrix
	"""
	rvec = np.asarray(rvec)
	tvec = np.asarray(tvec)

	T = np.eye(4)
	R, jac = cv.Rodrigues(rvec)
	T[:3, :3] = R
	T[:3, 3] = tvec
	return T

def matrix_to_rtvec(matrix):
	"""
	Convert 4x4 matrix to rotation vector and translation vector
	"""
	rvec, jac = cv.Rodrigues(matrix[:3, :3])
	tvec = matrix[:3, 3]
	return rvec, tvec

Sorry, I am still very new to all of this. could you explain how to use cv.perspectiveTransform() and these functions to help understand this all.

Also I dabbled around with the first function and it is throwing an error:
ValueError: could not broadcast input array from shape (3,1) into shape (3,) on line T[:3, 3] = tvec

Here are the values from the mona lisa image I included in the top most post.

translation_vector:  [[-167.53555238]
 [-203.05570512]
 [2571.1324635 ]]
rotation_vector:  [[-2.808854  ]
 [-0.01133365]
 [ 0.42841277]]

fix:

def rtvec_to_matrix(rvec, tvec):
	"""
	Convert rotation vector and translation vector to 4x4 matrix
	"""
	rvec = np.asarray(rvec)
	tvec = np.asarray(tvec)

	T = np.eye(4)
	R, jac = cv.Rodrigues(rvec)
	T[:3, :3] = R
	T[:3, 3] = tvec.squeeze() # this is the fix
	return T

I don’t see that picture. your first post only includes a link to a blog post.

your tvec looks like the calibration might be in mm.

the 4x4 transformation matrix for your rvec and tvec is:

array([[   0.95552,   -0.03688,   -0.29262, -167.53555],
       [   0.0523 ,   -0.95524,    0.29118, -203.05571],
       [  -0.29026,   -0.29354,   -0.91082, 2571.13246],
       [   0.     ,    0.     ,    0.     ,    1.     ]])

I’m gonna talk about “markers” purely because I’m kinda stuck in that lingo. it comes from working with AR markers. I imagine if you work with faces, the axes are placed similarly (Z is nose/forward, X to subject’s left, Y up)

the rotation maps (read each column)…

  • X to +X, with a bit of -Z but not much
  • Y to -Y (so it points up/far)
  • Z to -Z (so it points near/up)

which rotates marker-local coordinates/directions into camera-local points/directions. so that means you’re facing the marker. it probably faces a little to your bottom left.

the translation then simply moves all that away/far by 2571, and a little to the top left of the image center.


Spot on, its pointing towards the bottom left.

I am still a little confused how you read the matrix to determine the position of the vector though.

the matrix transforms object-local points/vectors to camera-local.

you can get a sense for what it does by figuring where the (object-local) axes get mapped to. X is (1,0,0), Y is (0,1,0), Z is (0,0,1). append a 1 for points or a 0 for vectors.

the +Z vector gets mapped to…

\begin{pmatrix} 0.95552 & -0.03688 & -0.29262 & -167.536 \\ 0.0523 & -0.95524 & 0.29118 & -203.056 \\ -0.29026 & -0.29354 & -0.91082 & 2571.13 \\ 0 & 0 & 0 & 1 \end{pmatrix} \cdot \begin{pmatrix} 0 \\ 0 \\ 1 \\ 0 \end{pmatrix} = \begin{pmatrix} -0.29262 \\ 0.29118 \\ -0.91082 \\ 0 \end{pmatrix}

so the +Z vector gets mapped to… the third column.

now, where does a camera-local vector point if that is its value? a little to -X (left), a little to +Y (down), and mostly to -Z, so that’s near. and that is where the face’s +Z vector points, as viewed by the camera.

same for the others.

1 Like