Hi all,
in my C++ software project I am using cv::triangulatePoints, in order to estimate the distance from an aerial sensor to objects on the ground in a simulation environment.
Thereby, problems occur as soon as the aircraft changes its orientation between two frames.
If the aircraft does not perform a change in rotation, but only in translation, the results are correct. For simplicity, it is assumed that the camera is mounted at the center of gravity of the aircraft without a gimbal, facing in the direction of flight.
cv::triangulatePoints requires points in the first image that belong to points in the second image. I obtain these points by feature detection in the first image and feature tracking in the second image.
Furthermore, for each frame the projection matrix is needed, which can be formed by the intrinsic & extrinsic camera parameters: P = K [Rt]
Since the results are correct for pure translation, I am very sure that the error is caused by an incorrect calculation of the rotation matrix in the second image.
From my simulation environment, the aircraft position is provided in WGS84 coordinates, which I transform to the camera coordinate system (where z points forward, x points right, and y points down, as shown in the figure below).
The transformations of positions include the following steps:
WGS84 → ECEF → NED → body → camera
Moreover, the aircrafts orientation represented as quaternion is available for every frame. The intrinsic matrix K is also known.
Source: Camera Calibration and 3D Reconstruction — OpenCV 2.4.13.7 documentation
First projection matrix
The camera pose of the first frame defines the origin of the camera coordinate system.
Thus, the rotation matrix R is a 3x3 identity matrix I and the translation vector is
t_0 = [0 0 0]T
From this follows: P_0 = K [It_0]
Second projection matrix
In my opinion, the error is in the calculation of the second projection matrix.
The rotation matrix and the translation vector in the camera coordinate system refer to the camera pose of the first frame. For example, assuming that the aircraft moves forward in a straight line by 100m, the translation vector would be:
t_1 = [0 0 100]T
For the calculation of the R matrix I perform the following steps:

Extract the roll, pitch and yaw angle from quaternion using ROS2 transform library tf2::Matrix3x3::getRPY for each camera pose.

Forming the difference of Euler angles between first and second frame and convert it again to quaternion with tf2::Quaternion::setRPY

Extracting rotation matrix R_1 from quaternion using tf2::Matrix3x3
Subsequently, the second projection matrix is formed by: P_1 = K [R_1  t_1]
Assumptions
For me, there are two guesses as to why the calculation of the rotation matrix in the second image is incorrect:

Roll, pitch and yaw are defined in body coordinate system (x points forward, y points right and z points down), but the axes of the camera coordinate system are reversed meaning that roll angle turns around zaxis, pitch turns around xaxis and yaw turns around yaxis)

Translation vector of second frame must be rotated as supposed in Matlabs camera matrix function Camera projection matrix  MATLAB cameraMatrix  MathWorks Deutschland, but then the equation of the second projection matrix mentioned before would become to P_1 = K [R_1  t_1 * R_1 ].
Questions
To sum up, I have three questions to the community:

How can I convert aircrafts roll, pitch and yaw into camera coordinate system? I am convinced that translation & rotation must be available in the camera coordinate system in both the first and second images.

Is the second projection matrix P_1 = K [R_1  t_1] or P_1 = K [R_1  t_1 * R_1 ], as proposed in Matlabs triangulation function?

Is there a fundamental mistake in my approach?
Thank you very much,
Adrian
P.S. Sorry for the bad display of the formulas, as a new user I could not insert all formulas as a picture