I’ve been working on using solvePNP to get the location of my detected object all day and now into the night I’m starting to feel like maybe I have a big misunderstanding. I have a picture I took with my calibrated camera of my object. And I have a tflite model that detects all of my 2D points. I plot those points on the image as little white circles. Then I have a set of 3D points with units in meters. I got these points from my accurate 3D CAD model of my object. Their 0,0 reference point is just the center of the CAD model not the place they would hypothetically be in the picture. Each of these points corresponds 1:1 to each other. I have a calibrated camera matrix with all values in pixels (I know the pixel size but don’t convert it to meters).

Now I run these three through solvePNP and I get an rvec and a tvec. If I use projectPoints and plot those with red circles then the red and white circles almost exactly match up over the top of my object in the picture.

I’ve spent most of my day trying to figure out how to convert these into position and rotation for my model in Unity but to no avail. Now I just tried to simplify things and I look at the Tvec, it’s z axis is always negative which would put it behind my camera and I do not understand why.

So I wondered:

Am I misunderstanding how to use solvePNP. I thought if I took a picture, and I know the 3D points of the model then solvePNP will tell me how to rotate and translate the object such that it would be in the right position in the picture. Do I have this wrong? Do my 3D object points actually have to be what the points would be in that picture? I thought the point of solvePNP was to help me figure that out but maybe I’m lost here.

Should I change the units on my camera matrix to be in meters?

This is my code:

```
camera_matrix = np.array(
[[777.173004, 0, 638.1196159],
[0, 778.566312, 336.5782565],
[0, 0, 1]], dtype="double"
)
#dist_coeffs = np.array([-0.050800878,-0.107825331,0.000861304,-0.00044001])
dist_coeffs = np.array([])
(success, rotation_vector, translation_vector) = cv2.solvePnP(model_points, image_points, camera_matrix,
dist_coeffs, flags=cv2.cv2.SOLVEPNP_ITERATIVE)
#(success, rotation_vector, translation_vector, inliers) = cv2.solvePnPRansac(model_points, image_points, camera_matrix,
# dist_coeffs, flags=cv2.cv2.SOLVEPNP_ITERATIVE)
print("Rotation Vector:\n" + str(rotation_vector))
print("Translation Vector:\n" + str(translation_vector))
(repro_points, jacobian) = cv2.projectPoints(model_points, rotation_vector,
translation_vector, camera_matrix, dist_coeffs)
original_img = cv2.imread(r"{}".format("IMG_20210301_193529.jpg"))
#original_img = cv2.resize(original_img, (192 , 192))
for p in repro_points:
cv2.circle(original_img, (int(p[0][0]), int(p[0][1])), 3, (255, 255, 255), -1)
print(str(p[0][0]) + "-" + str(p[0][1]))
for p in image_points:
cv2.circle(original_img, (int(p[0]), int(p[1])), 3, (0, 0, 255), -1)
#cv2.circle(original_img, (100, 100), 100, (255, 255, 255), -1)
cv2.imshow("og", original_img)
cv2.waitKey()
```

and my Tvec comes out as:

Translation Vector:

[[ 0.0896325 ]

[-0.14819345]

[-0.36882839]]