Are the magnitudes of the angles shown in the drawing representative of what you are getting in your experiment. In the drawing it looks like solvePNP is giving you an angle of about 20 degrees, and you are expecting 0 degrees.
I’m asking if the magnitudes are representative (as opposed to being exaggerated to illustrate the problem) because if the angles are small, it’s possible that the effect you are seeing is related to the optical axis of the camera not being aligned with the nominal axis of the physical camera / enclosure.
But maybe I’m getting ahead of myself. First, can you confirm that you have calibrated the camera (intrinsics) before calling solvePNP. If so, inspect the camera matrix and see how far the calibrated image center (Cx, Cy) is from the nominal image center (width-1/2, height-1/2). Consider this delta, along with calibrated focal length (represented in pixels), to get a sense of how far off the optical axis is from the mechanical axis (for lack of a better term.)
Another way to come at this is to move your object left/right and see how the rotation from solvePNP changes. I think you are supposing the angle changes depending on where the object is - try it and see (while only translating) If it does change, then post the code and input images you are using, along with the camera intrinsics / distortion model.
Not to harp on the bit about calibrating the intrinsics, but it is important. In addition to calibrating the image center and focal length, you will get distortion coefficients. If your lens has noticeable distortion, you will absolutely need to correct your image points before feeding them to solvePNP.
As for the MRE, it might not always be obvious how it will help, but if you provide code and images for what you are doing (however you are getting the results you believe are bogus), it can reveal so many things that might be more difficult to discover through a back and forth conversation.