solvePnP gives Wrong Values

With cv2.solvePnP I try to do pose a estimation in pyvista, which is a python wrapper for vtk.
The results of solvePnP seem wrong to me, i.e. the resulting translation and rotation. For simplicity I try to “undo” a translation of the camera. I expect the inverse of the translation to be the result of solvePnP.

enter image description here
The translation is just

Translation = np.array([[  1.,   0.,   0., 1000.],
                        [  0.,   1.,   0.,   0.],
                        [  0.,   0.,   1.,   0.],
                        [  0.,   0.,   0.,   1.]])

i.e. a shift along one axis. But the resulting rvec,tvec are

 rvec = array([ 0., 0., -3.142]),
 tvec = array([ 707.107, 408.248, 8882.736])

The resulting translation and rotation seem nonsensical to me. Since the translation does no rotation, I expect that only tvec has non zero entries to undo the translation in the opposite direction. Concretely, I expect tvec= [-1000,0,0] and rvec=[0,0,0]. If I then apply this (wrong) result to the camera, all points disappear completely.

Here is What I do:

import cv2
import pyvista as pv
from pyvista import examples
import pyvistaqt
from vtk import vtkMatrix4x4, vtkMatrix3x3, vtkTransform
from vtk.util.numpy_support import vtk_to_numpy  
import numpy as np

np.set_printoptions(suppress=True,precision=3)

def getCamMatrix():
    
    narray = np.eye(4)
    vmatrix = plotter.camera.GetModelViewTransformMatrix()
    vmatrix.DeepCopy(narray.ravel(), vmatrix)
    return narray    

def toVTK(m,n =4):
    if n == 4:
        newMatrixVTK = vtkMatrix4x4()
    else:
        newMatrixVTK = vtkMatrix3x3()    
    for i in range(n):
        for j in range(n):
            newMatrixVTK.SetElement(i,j, m[i,j])            
    return newMatrixVTK

def applyMatrixToCam(newMatrix):
    global plotter
    newMatrixVTK = toVTK(newMatrix)    
    transform = vtkTransform()
    transform.SetMatrix(newMatrixVTK)
    transform.Update()
    plotter.camera.ApplyTransform(transform)
    pass

print("Setting up points in world coordinates")
Points =  np.array([[ 2918.972,  -887.573,   416.331,1],
        [ 2338.002,  -702.07 ,  1039.864,1],
        [ 1458.473,  -707.246,  1005.19,1 ],
        [ 1219.4  ,  -890.161,   377.004,1],
        [ 1318.727, -1017.829,  -156.537,1],
        [ 2529.132, -1026.888,  -169.222,1]])


pMesh = pv.PolyData(Points[:,:3]) # vtk object to hold the six points


plotter = pyvistaqt.BackgroundPlotter() # setting up the plotting function
plotter.enable_trackball_style()
plotter.add_mesh(pMesh)



print("Transforming from World to Image Coordinates")
# Rotating the points towards a camera at the origin, i.e. applying the default camera transform
projected = (getCamMatrix() @ Points.T)[:3,:].T 

print("store original image points")
image_points = projected.copy()[:,:2]
print("Applying the perspective transform, i.e. division by the Z-coordinate")
image_points /= projected[:,-1].reshape(-1,1) 


print("Setting up a simple translation of the camera position")
Translation = np.array([[  1.,   0.,   0., 1000.],
                        [  0.,   1.,   0.,   0.],
                        [  0.,   0.,   1.,   0.],
                        [  0.,   0.,   0.,   1.]])

applyMatrixToCam(Translation)


print("Apply the new Camera Matrix to the six points")
projected_shift = (getCamMatrix() @ Points.T)[:3,:].T

retval, rvec, tvec = cv2.solvePnP(projected_shift, np.array(image_points), np.eye(3), None, None, None, False, cv2.SOLVEPNP_EPNP)

R = cv2.Rodrigues(rvec)[0]
extrinsicReal = np.vstack([np.hstack([R.T, -R.T@tvec]), [0,0,0,1]])
applyMatrixToCam(extrinsicReal)

I’m not really sure what is going on, but a few things I would try / think about:

  1. your incorrect translation has an X value of 707.107, which rhymes with cos(45) (or sin). Maybe that’s a clue?
  2. Make sure your matrices are in the right form - maybe the one you are querying needs to be transposed? (maybe your SetElement call should be (i,j, m[j,i]) ?
  3. Check the data (print it out) going into solvePnP and make sure it looks right.

Point 2 was easy to check - no result. Maybe there is something to point 1 and 3, since the value also appears in camera matrix, that I start with, i.e. before applying the transformation:

array([[   -0.70710678,     0.70710678,     0.        ,  2074.41441523],
       [   -0.40824829,    -0.40824829,     0.81649658,   136.38146515],
       [    0.57735027,     0.57735027,     0.57735027, -5024.83541614],
       [    0.        ,     0.        ,     0.        ,     1.        ]])

Also the possibility of flipping the axis around may be at play here. Do the image coordinates have to be (X,Y) or (Y,X) ? Or something like (Y,-X), etc?

First of all, a point on terminology - the 4x4 matrix you have represents the extrinsics - the rotation and translation of the camera. It’s often called a modelview matrix, but not the camera matrix. When the term camera matrix is used it refers to the intrinsics (focal length, image center) of the camera and it describes the projective transform of 3D points into 2D projective space. Your use of the term camera matrix is confusing - for example your getCamMatrix() function actually queries the modelview matrix. I’m not saying this is your problem, specifically, but it makes it harder to reason about.

When I run into issues like this, I transform the points step by step and make sure the results at each stage match what I expect.

For example:

projected_shift = (getCamMatrix() @ Points.T)[:3,:].T

What is this supposed to do? The name of the variable (projected_shift) suggests that you are expecting the points to be projected into 2D projective space, but to me it looks like you are multiplying your 3D homogenous coordinates by the modelview matrix, and then just truncating the result (dropping the W coordinate). (I think that’s what the [:3,:] does, but I’m not sure). I suppose this is OK because the bottom row of the matrix is [0,0,0,1] and your world points all have a 1 for the W coord, but it might be better form to actually do the divide in case things change in the future. (For example you might compose your modelview and projection matrix into a single matrix, and your bottom row might no longer be [0,0,0,1])

In any case I don’t think that transform includes a projection (based on the code, and based on the matrix you included in your most recent post).

Good luck

1 Like