Problem with calculating essential matrix in camera pose estimation

My task is following. Given a set of images of some indoor scene with known pose I need to estimate the pose of some query image from the same space.

I have implemented pose estimation using image matching with SIFT and Flann matcher and calculating essential matrix in OpenCv and Python. I also undistort corresponding points as images of scene and query image can be captured by cameras with different intrinsics.

The problem is resultant pose of query image differ much from the ground truth and essential matrix seems to be inaccurate as matching images are very close and camera displacement should be extremely small. Here is my code:

dist_coeffs = None                                                                                                                                 
# undistort points                                                                                                                                                      pts1 = cv2.undistortPoints(np.expand_dims(pts1, axis=1), cameraMatrix=K_q, distCoeffs=dist_coeffs, R=None, P=np.identity(3))                                                                                                                                                                      

pts2 = cv2.undistortPoints(np.expand_dims(pts2, axis=1), cameraMatrix=K_v, distCoeffs=dist_coeffs, R=None, P=np.identity(3))
thresh = 1.0                                                                                                                                                   
E, mask = cv2.findEssentialMat(pts1, pts2, cameraMatrix=np.identity(3), method=cv2.RANSAC, prob=F_DIST, threshold=thresh)

_, R_m, t_m, mask = cv2.recoverPose(E, pts1, pts2, cameraMatrix=np.identity(3))

According to official documentation, if points1 and points2 are feature points from cameras with different camera intrinsic matrix, use undistortPoints() with P = cv::NoArray() for both cameras to transform image points to normalized image coordinates, which are valid for the identity camera intrinsic matrix. When passing these coordinates, pass the identity matrix for this parameter."
Here is example of essential matrix and recovered pose:

Essential matrix                                                                                                                                                        
[[ 0.23120721 -0.35422666  0.45764383]                                                                                                                                   
[ 0.27209477 -0.18381956 -0.52687146]                                                                                                                                   
[-0.32365465  0.34509926  0.04862367]]                                                                                                                                 
REcovering pose                                                                                                                                                         
[[ 0.84963004 -0.11140984 -0.51547711]                                                                                                                                   
[-0.10791722  0.92002069 -0.37671715]                                                                                                                                   
[ 0.5162196   0.37569906  0.76965417]]

My question is whether my implementation is correct.