StereoCalibrate and StereoRectify with different resolutions

I have an optical (1440x1080) camera and infrared (640x512) camera that I want to use as a stereo pair. For calibration I have obtained the intrinsic parameters separetely with very low reprojection error (0.10 or less). Next, I want to run cv2.stereocalibrate and cv2.stereorectify to obtain a disparity (Q matrix) for the stereo pair.

I have obtained a synchronized video with the cameras both seeing several different poses of the chessboard. However, I want to make sure that my understanding is correct.

As I have two different resolutions, should I scale down the optical camera’s (1440x1080) images AND camera matrix (with cv2.getOptimalNewCameraMatrix) to a new size of (640x512) and then run chessboard detection and pass the corners to stereocalibrate and stereorectify using the new scaled camera matrix?

If not, what should the process be to account for the change in resolution? So far I am not getting great results after the stereo calibration procedure, but I’d like to verify my process first.

1 Like

no. no scaling. do not resample anything.

each camera has its own camera matrix. that expresses each camera’s properties.

cameras need not be identical.

there is nothing for you to “do”.

do not “do” anything.

if you “do” anything, it will be detrimental to the entire process.

show your inputs and outputs.

Ok, just wanted to check if that was the wrong way to go about it.

So, now without any rescaling (other than for cv2.imshow) as mentioned before:

The IR camera intrinsic parameters:

<?xml version="1.0"?>
<opencv_storage>
<camera_name>ir</camera_name>
<average_error>5.1821870542833971e-02</average_error>
<standard_deviation_error>7.7188103307330247e-03</standard_deviation_error>
<frames_used>265</frames_used>
<camera_matrix type_id="opencv-matrix">
  <rows>3</rows>
  <cols>3</cols>
  <dt>d</dt>
  <data>
    7.4802115340833984e+02 0. 3.1950000000000000e+02 0.
    7.5288404670723958e+02 2.5550000000000000e+02 0. 0. 1.</data></camera_matrix>
<distortion_coef type_id="opencv-matrix">
  <rows>1</rows>
  <cols>5</cols>
  <dt>d</dt>
  <data>
    -4.8196856529060433e-01 2.1818770267770027e-01
    -8.7580857567709346e-03 -2.7249149244124946e-03
    -1.2731547345719221e-01</data></distortion_coef>
<frame_size type_id="opencv-matrix">
  <rows>2</rows>
  <cols>1</cols>
  <dt>d</dt>
  <data>
    640. 512.</data></frame_size>

The optical camera intrinsic parameters:

<?xml version="1.0"?>
<opencv_storage>
<camera_name>opt</camera_name>
<average_error>9.2124809144888783e-02</average_error>
<standard_deviation_error>1.8241452508158749e-02</standard_deviation_error>
<frames_used>282</frames_used>
<camera_matrix type_id="opencv-matrix">
  <rows>3</rows>
  <cols>3</cols>
  <dt>d</dt>
  <data>
    1.4349869566047616e+03 0. 7.1950000000000000e+02 0.
    1.4275809764959108e+03 5.3950000000000000e+02 0. 0. 1.</data></camera_matrix>
<distortion_coef type_id="opencv-matrix">
  <rows>1</rows>
  <cols>5</cols>
  <dt>d</dt>
  <data>
    -3.6871797553935726e-01 1.8287793665147228e-01
    1.1779825314816670e-03 -8.8811739018050304e-04
    1.1101349450350156e-02</data></distortion_coef>
<frame_size type_id="opencv-matrix">
  <rows>2</rows>
  <cols>1</cols>
  <dt>d</dt>
  <data>
    1440. 1080.</data></frame_size>

Detect corners method:

    def find_corners_from_videos(self, video_paths: list, frame_increment: int = 0, visualize: bool = True):

        print("Starting checkerboard detection...\n")

        # Load the synchronized videos
        kernels = []
        calibration_videos: list[cv2.VideoCapture] = []
        for idx, video_path in enumerate(video_paths):
            self.cameras[idx].video_path = video_path
            calibration_videos.append(cv2.VideoCapture(str(self.cameras[idx].video_path)))

            if not calibration_videos[idx].isOpened():
                print("Error: Video not found/opened.")
                sys.exit()

            frame_W = calibration_videos[idx].get(cv2.CAP_PROP_FRAME_WIDTH)
            frame_H = calibration_videos[idx].get(cv2.CAP_PROP_FRAME_HEIGHT)

            if self.cameras[idx].frame_size[0] != int(frame_W) or self.cameras[idx].frame_size[1] != int(frame_H):
                print('Error: The intrinsic calibration Frame W/H do not match the provided video\n')
                print('Intrinsic Video Frame Size: ', self.cameras[idx].frame_size)
                print('Extrinsic Video Frame Size: ', (int(frame_W), int(frame_H)))
                sys.exit()

            min_dimension = min(frame_H, frame_W)
            alpha = int(round(0.0125 * min_dimension))
            if alpha < 7:
                alpha = 7
            kernels.append((alpha, alpha))

        # object_points --> (0, 0, 0), (1, 0, 0), ... (
        object_points = np.zeros((self.checkerboard[0] * self.checkerboard[1], 3), np.float32)
        object_points[:, :2] = np.mgrid[0:self.checkerboard[0], 0:self.checkerboard[1]].T.reshape(-1, 2)
        object_points = object_points * self.square_size  # square_size = X mm --> Y m

        frame_position = 0
        frames_processed = 1
        while calibration_videos[0].isOpened() and calibration_videos[1].isOpened():
            retP, frameP = calibration_videos[0].read()
            retS, frameS = calibration_videos[1].read()

            if not retP or not retS:
                # No frame from VideoCapture object --> Exit the loop
                break

            grayP = cv2.cvtColor(frameP, cv2.COLOR_BGR2GRAY)
            grayS = cv2.cvtColor(frameS, cv2.COLOR_BGR2GRAY)

            # Find image points
            retP, cornersP = cv2.findChessboardCorners(grayP, self.checkerboard,
                                                       flags=cv2.CALIB_CB_ADAPTIVE_THRESH + cv2.CALIB_CB_NORMALIZE_IMAGE + cv2.CALIB_CB_FILTER_QUADS)
            retS, cornersS = cv2.findChessboardCorners(grayS, self.checkerboard,
                                                       flags=cv2.CALIB_CB_ADAPTIVE_THRESH + cv2.CALIB_CB_NORMALIZE_IMAGE + cv2.CALIB_CB_FILTER_QUADS)

            if retP and retS:  # Detection is needed in both images to estimate the mapping
                # If image points, remember idx and refine points
                cornersP = cv2.cornerSubPix(grayP, cornersP, kernels[0], (-1, -1), criteria=self.termination_criteria)
                cornersS = cv2.cornerSubPix(grayS, cornersS, kernels[1], (-1, -1), criteria=self.termination_criteria)

                if visualize:
                    print("Enter to continue, s to skip a bad frame, ESCAPE to stop visualization.\n")
                    frameP = cv2.drawChessboardCorners(frameP, self.checkerboard, cornersP, retP)
                    frameS = cv2.drawChessboardCorners(frameS, self.checkerboard, cornersS, retS)

                    frameP_scaled = cv2.resize(frameP, (frameS.shape[1], frameS.shape[0]))
                    frame = np.hstack((frameP_scaled, frameS))

                    cv2.namedWindow("Detected Corners (Left: P, Right: S)",
                                    cv2.WINDOW_NORMAL)  # Create window with freedom of dimensions
                    cv2.imshow("Detected Corners (Left: P, Right: S)", frame)
                    key = cv2.waitKey(0) & 0xFF
                    if key == 27:  # Escape key
                        cv2.destroyAllWindows()
                        visualize = False
                    elif key == 115:  # s key
                        print('Skipped\n')

                        # Increment the video frame position if specified by user
                        if frame_increment not in range(0, 2):
                            frame_position += frame_increment
                            calibration_videos[0].set(cv2.CAP_PROP_POS_FRAMES, frame_position)
                            calibration_videos[1].set(cv2.CAP_PROP_POS_FRAMES, frame_position)

                        frames_processed += 1
                        if frames_processed % 25 == 0:
                            print(f"Frame Number: {frames_processed} -|- Good Frames: {len(self.camera_points)}")
                        continue

                # Do the Primary Camera first
                self.cameras[0].frame_indices.append(calibration_videos[0].get(cv2.CAP_PROP_POS_FRAMES))
                self.cameras[0].world_points.append(object_points)
                self.cameras[0].camera_points.append(cornersP)

                # Do the Secondary Camera next
                self.cameras[1].frame_indices.append(calibration_videos[1].get(cv2.CAP_PROP_POS_FRAMES))
                self.cameras[1].world_points.append(object_points)
                self.cameras[1].camera_points.append(cornersS)

            # Increment the video frame position if specified by user
            if frame_increment not in range(0, 2):
                frame_position += frame_increment
                calibration_videos[0].set(cv2.CAP_PROP_POS_FRAMES, frame_position)
                calibration_videos[1].set(cv2.CAP_PROP_POS_FRAMES, frame_position)

            frames_processed += 1
            if frames_processed % 25 == 0:
                print(f"Frame Number: {frames_processed} -|- Good Frames: {len(self.cameras[0].camera_points)}")

        cv2.destroyAllWindows()

        print("Getting calibration coverage...")
        canvasP = np.zeros((1080, 1440, 3), dtype=np.uint8)
        canvasS = np.zeros((512, 640, 3), dtype=np.uint8)

        for idx in range(len(self.cameras[0].world_points)):
            canvasP = cv2.drawChessboardCorners(canvasP, self.checkerboard, self.cameras[0].camera_points[idx], True)
            canvasS = cv2.drawChessboardCorners(canvasS, self.checkerboard, self.cameras[1].camera_points[idx], True)

        canvasP_scaled = cv2.resize(canvasP, (canvasS.shape[1], canvasS.shape[0]))
        canvas = np.hstack((canvasP_scaled, canvasS))
        cv2.imshow("Calibration Coverage | Primary (L) Secondary (R)", canvas)
        cv2.waitKey(0)
        cv2.destroyAllWindows()

Stereo Calibration Method:

    def stereo_calibrate(self):
        """
        Performs stereo calibration between two cameras
        """

        print("Starting stereo calibration...\n")

        identity_R = np.identity(3, dtype=np.float64)
        identity_T = np.array([[0.], [0.], [0.]], dtype=np.float64)

        ret, self.cameras[0].camera_matrix, self.cameras[0].distortion_coef, \
        self.cameras[1].camera_matrix, self.cameras[1].distortion_coef, \
        self.rotation_matrix, self.translation_matrix, self.essential_matrix, \
        self.fundamental_matrix, per_view_errors = cv2.stereoCalibrateExtended(
            objectPoints=self.cameras[0].world_points,
            imagePoints1=self.cameras[0].camera_points,
            imagePoints2=self.cameras[1].camera_points,
            cameraMatrix1=self.cameras[0].camera_matrix,
            distCoeffs1=self.cameras[0].distortion_coef,
            cameraMatrix2=self.cameras[1].camera_matrix,
            distCoeffs2=self.cameras[1].distortion_coef,
            imageSize=self.cameras[0].frame_size,
            R=identity_R,
            T=identity_T,
            flags=self.calibration_flags,
            criteria=self.termination_criteria)

        self.rep_error_AVG = np.array(np.nanmean(per_view_errors))
        self.rep_error_SD = np.array(np.nanstd(per_view_errors))
        print(f"Stereo reprojection error: {self.rep_error_AVG} +/= {self.rep_error_SD}\n")

        print('Done Stereo Calibration\n')

Stereo Rectify Method:

    def stereo_rectification(self):

        print("...getting Rectification and Projection matrices...")

        self.cameras[0].rectification_matrix, self.cameras[1].rectification_matrix, \
        self.cameras[0].projection_matrix, self.cameras[1].projection_matrix, \
        self.disparity_matrix, self.cameras[0].sr_roi, self.cameras[1].sr_roi = cv2.stereoRectify(
            cameraMatrix1=self.cameras[0].camera_matrix,
            distCoeffs1=self.cameras[0].distortion_coef,
            cameraMatrix2=self.cameras[1].camera_matrix,
            distCoeffs2=self.cameras[1].distortion_coef,
            imageSize=self.cameras[0].frame_size,
            R=self.rotation_matrix,
            T=self.translation_matrix)

        self.cameras[0].mapX, self.cameras[0].mapY = cv2.initUndistortRectifyMap(
            cameraMatrix=self.cameras[0].camera_matrix,
            distCoeffs=self.cameras[0].distortion_coef,
            R=self.cameras[0].rectification_matrix,
            newCameraMatrix=self.cameras[0].camera_matrix,
            size=self.cameras[0].frame_size,
            m1type=cv2.CV_32FC1)

        self.cameras[1].mapX, self.cameras[1].mapY = cv2.initUndistortRectifyMap(
            cameraMatrix=self.cameras[1].camera_matrix,
            distCoeffs=self.cameras[1].distortion_coef,
            R=self.cameras[1].rectification_matrix,
            newCameraMatrix=self.cameras[1].camera_matrix,
            size=self.cameras[1].frame_size,
            m1type=cv2.CV_32FC1)

        print('Done Stereo Rectification\n')

Visualize Rectified Frames:

camera_array = CameraArray()

    camera_array.adk.load_intrinsic_xml(file_path='../../calibrations/intrinsic/ir_intrinsic_13:46:31.xml')
    camera_array.adk.load_extrinsic_xml(file_path='../../calibrations/extrinsic/secondary_ir_extrinsic_10:28:15.xml')

    camera_array.bfs.load_intrinsic_xml(file_path='../../calibrations/intrinsic/opt_intrinsic_13:44:51.xml')
    camera_array.bfs.load_extrinsic_xml(file_path='../../calibrations/extrinsic/primary_opt_extrinsic_10:28:15.xml')

    videoP = cv2.VideoCapture(
        '../../data_out/video_out/calib_test/optical_11:42:24.mp4')
    videoS = cv2.VideoCapture(
        '../../data_out/video_out/calib_test/infrared_11:42:24.mp4')

    while videoP.isOpened() and videoS.isOpened():
        retO, frameP = videoP.read()
        retS, frameS = videoS.read()

        if not retO or not retS:
            # No frame from VideoCapture object --> Exit the loop
            break

        frameP_map = cv2.remap(frameP, camera_array.bfs.mapX, camera_array.bfs.mapY, cv2.INTER_CUBIC)
        frameS_map = cv2.remap(frameS, camera_array.adk.mapX, camera_array.adk.mapY, cv2.INTER_CUBIC)

        frameP_map = cv2.rectangle(frameP_map,
                                   pt1=(int(camera_array.bfs.sr_roi[0][0]),
                                        int(camera_array.bfs.sr_roi[1][0])),
                                   pt2=(int(camera_array.bfs.sr_roi[2][0]),
                                        int(camera_array.bfs.sr_roi[3][0])),
                                   color=(0, 255, 0))
        frameS_map = cv2.rectangle(frameS_map,
                                   pt1=(int(camera_array.adk.sr_roi[0][0]),
                                        int(camera_array.adk.sr_roi[1][0])),
                                   pt2=(int(camera_array.adk.sr_roi[2][0]),
                                        int(camera_array.adk.sr_roi[3][0])),
                                   color=(0, 255, 0))

        frameP_map_scaled = cv2.resize(frameP_map, (frameS_map.shape[1], frameS_map.shape[0]))
        frame_map = np.hstack((frameP_map_scaled, frameS_map))

        for p in range(20):
            frame_map = cv2.line(frame_map, pt1=(0, p * 25), pt2=(1280, p * 25), color=(0, 0, 255))

        cv2.namedWindow("Mapped Frames")  # Create window with freedom of dimensions
        cv2.imshow("Mapped Frames", frame_map)
        key = cv2.waitKey(10)

The IR camera extrinsic parameters:

<?xml version="1.0"?>
<opencv_storage>
<camera_primary>opt</camera_primary>
<camera_secondary>ir</camera_secondary>
<average_error type_id="opencv-matrix">
  <rows>1</rows>
  <cols>1</cols>
  <dt>d</dt>
  <data>
    3.3209568606265201e+00</data></average_error>
<standard_deviation_error type_id="opencv-matrix">
  <rows>1</rows>
  <cols>1</cols>
  <dt>d</dt>
  <data>
    2.4494093491827220e+00</data></standard_deviation_error>
<frames_used>262</frames_used>
<rotation_matrix type_id="opencv-matrix">
  <rows>3</rows>
  <cols>3</cols>
  <dt>d</dt>
  <data>
    1. 0. 0. 0. 1. 0. 0. 0. 1.</data></rotation_matrix>
<translation_matrix type_id="opencv-matrix">
  <rows>3</rows>
  <cols>1</cols>
  <dt>d</dt>
  <data>
    0. 0. 0.</data></translation_matrix>
<essential_matrix type_id="opencv-matrix">
  <rows>3</rows>
  <cols>3</cols>
  <dt>d</dt>
  <data>
    1. 0. 0. 0. 1. 0. 0. 0. 1.</data></essential_matrix>
<fundamental_matrix type_id="opencv-matrix">
  <rows>3</rows>
  <cols>3</cols>
  <dt>d</dt>
  <data>
    1. 0. 0. 0. 1. 0. 0. 0. 1.</data></fundamental_matrix>
<rectification_matrix type_id="opencv-matrix">
  <rows>3</rows>
  <cols>3</cols>
  <dt>d</dt>
  <data>
    9.9636819573698854e-01 6.0310027005657317e-02 6.0109226965541662e-02
    -6.1270304258908226e-02 9.9801932715418740e-01
    1.4260871036640765e-02 -5.9130096734564250e-02
    -1.7891988969362123e-02 9.9808993001176072e-01</data></rectification_matrix>
<projection_matrix type_id="opencv-matrix">
  <rows>3</rows>
  <cols>4</cols>
  <dt>d</dt>
  <data>
    1.0902325116015752e+03 0. 3.2922803878784180e+02
    -1.4730195692977671e+02 0. 1.0902325116015752e+03
    3.5091096496582031e+02 0. 0. 0. 1. 0.</data></projection_matrix>
<disparity_matrix type_id="opencv-matrix">
  <rows>4</rows>
  <cols>4</cols>
  <dt>d</dt>
  <data>
    1. 0. 0. -3.2922803878784180e+02 0. 1. 0. -3.5091096496582031e+02 0.
    0. 0. 1.0902325116015752e+03 0. 0. 7.4013443835055224e+00 0.</data></disparity_matrix>
<sr_roi type_id="opencv-matrix">
  <rows>4</rows>
  <cols>1</cols>
  <dt>d</dt>
  <data>
    5. 0. 1435. 1080.</data></sr_roi>

The optical camera extrinsic parameters:

<?xml version="1.0"?>
<opencv_storage>
<camera_primary>opt</camera_primary>
<camera_secondary>ir</camera_secondary>
<average_error type_id="opencv-matrix">
  <rows>1</rows>
  <cols>1</cols>
  <dt>d</dt>
  <data>
    3.3209568606265201e+00</data></average_error>
<standard_deviation_error type_id="opencv-matrix">
  <rows>1</rows>
  <cols>1</cols>
  <dt>d</dt>
  <data>
    2.4494093491827220e+00</data></standard_deviation_error>
<frames_used>262</frames_used>
<rotation_matrix type_id="opencv-matrix">
  <rows>3</rows>
  <cols>3</cols>
  <dt>d</dt>
  <data>
    9.9921373300565774e-01 -1.7778249680939894e-02
    -3.5437968496799643e-02 1.6606872421687648e-02
    9.9931476119504714e-01 -3.3078994030286100e-02
    3.6001771640681704e-02 3.2464471287363884e-02 9.9882427410569274e-01</data></rotation_matrix>
<translation_matrix type_id="opencv-matrix">
  <rows>3</rows>
  <cols>1</cols>
  <dt>d</dt>
  <data>
    -1.3461989391514784e-01 -8.1485232791034690e-03
    -8.1213930673863996e-03</data></translation_matrix>
<essential_matrix type_id="opencv-matrix">
  <rows>3</rows>
  <cols>3</cols>
  <dt>d</dt>
  <data>
    -1.5849033574659747e-04 7.8512904736774795e-03
    -8.4075903620775428e-03 -3.2684528050434857e-03
    4.5147478344253391e-03 1.3474942349165506e-01 5.9065009605264946e-03
    -1.3467251362130539e-01 4.1643235559166167e-03</data></essential_matrix>
<fundamental_matrix type_id="opencv-matrix">
  <rows>3</rows>
  <cols>3</cols>
  <dt>d</dt>
  <data>
    -1.3164758940922297e-08 6.5553875575179965e-07
    -1.3463339432451182e-03 -2.6973551304241571e-07
    3.7452137606934829e-07 1.5949738191570150e-02 4.4011360948314457e-04
    -8.7161840358698835e-03 1.</data></fundamental_matrix>
<rectification_matrix type_id="opencv-matrix">
  <rows>3</rows>
  <cols>3</cols>
  <dt>d</dt>
  <data>
    9.9875038389741089e-01 4.4506431949795755e-02 2.2734295235888397e-02
    -4.4134733178051395e-02 9.9888769598857219e-01
    -1.6598076091636053e-02 -2.3447728932170495e-02
    1.5573962814252985e-02 9.9960374933779828e-01</data></rectification_matrix>
<projection_matrix type_id="opencv-matrix">
  <rows>3</rows>
  <cols>4</cols>
  <dt>d</dt>
  <data>
    1.0902325116015752e+03 0. 3.2922803878784180e+02 0. 0.
    1.0902325116015752e+03 3.5091096496582031e+02 0. 0. 0. 1. 0.</data></projection_matrix>
<disparity_matrix type_id="opencv-matrix">
  <rows>4</rows>
  <cols>4</cols>
  <dt>d</dt>
  <data>
    1. 0. 0. -3.2922803878784180e+02 0. 1. 0. -3.5091096496582031e+02 0.
    0. 0. 1.0902325116015752e+03 0. 0. 7.4013443835055224e+00 0.</data></disparity_matrix>
<sr_roi type_id="opencv-matrix">
  <rows>4</rows>
  <cols>1</cols>
  <dt>d</dt>
  <data>
    0. 0. 967. 764.</data></sr_roi>



so your cameras are about 134.6 mm apart? says the calibration.

in the epiline picture it looks like the alignment didn’t succeed. various keypoints from the scene don’t match up.

always hold the calibration pattern so its plane is not parallel to any of the image planes of the cameras.

also don’t position your cameras too perfectly. that will hide errors in the code/math like swapped pictures/cameras/matrices/… position them sloppily, so issues become apparent, and only the correct calibration makes them disappear.

Is it possible to calculate extrinsic parameters for two cameras with different resolution?

The answer is yes - the standard procedure in Python gives quite reasonable results for rotation matrix and translation vector.
Somebody (Stereo Calibration of two images with cameras of different resolution in MATLAB - Stack Overflow) mentioned that Matlab’s stereo calibration app can also handle such a case. Well, the R2021a version requires all images be the same size for stereo calibration. So, don’t use Matlab for that.
My question is that: if I crop the bigger image to the size of the smaller (such that the calibration pattern is visible) and run stereo calibration, what the result will be?
By subsampling/cropping bigger images and applying the standard procedure one can find the correct rotation matrix as cropped part of the image lies in the same plane as a full image. However, with translation vector is quite different. Does anybody have an idea how translations in these two cases are related?
Will be grateful for any hints,
Yuri

I’m not sure that I fully understand what you are asking, but a few comments:

If you want to crop or resample the image to get around the “same resolution” limitation, you should be able to do this, but you will will to calibrate the intrinsics with the same crop (or resampling) as you apply to the stereo calibration step. You can’t just use the intrinsics that you generated from the full size image and apply it to a cropped or resampled image.

If you must crop or resample, prefer the crop if possible.

Some clarification: first, it appears that there is no “limitation” - OpenCV (Python version in my case) works fine with two sets of images with different resolution. Second, bigger images were undistorted and cropped after that. Calibration using the cropped images gave much smaller distortion coefficients (expectedly) and almost the same focal lengths. Distortion center (cx/cy) is irrelevant as images were corrected for distortion.
Then I compared extrinsics for two stereo calibrations: big/small and cropped/small. Rotation matrices were identical (expectedly), but translation vectors different. In the first case:
[[-44.08679495][ 4.52906981][ 0.18483411]]
which correlated with manual measurements. In the second case:
[[-37.94634993][ -2.20534329][-21.76078176]],
and this result I cannot interpret. Big images were (1296, 2304, 3), small (720, 1280, 3). Cropping rectangle: [685, 370, 1405, 1965]
I guess that there is a relationship between cropping offsets and change in translation vector, but I cannot figure it out.
Any ideas?

Again, I’m not sure I follow what you are doing exactly (for example why you want to crop the image instead of using the full image)

TL;DR: Make sure you are using calibrated values for cx,cy in every step. I suspect this is the root of your problem.

A few comments:

“Distortion center is irrelevant…” Note that cx,cy is also used in the camera matrix for the 3D->2D projection, so if cx,cy is wrong, then your projections (and everything that relies on them) will be wrong. So, for example, if you used the cx,cy values from the big image when doing the stereo calibration for the cropped image, things will be wrong.

I noticed that your fx,fy values are different. Most modern cameras have square pixels, so it is typically best to use the CALIB_FIX_ASPECT_RATIO when claibrating the intrinsics. Otherwise you allow the optimizer to vary the fx and fy independently, which might result in a better score but a less accurate model (since physically fx and fy are actually the same). This is a minor point and is not what is causing your issues.

I also noticed that the intrinsics for both cameras has cx,cy as exactly the numerical center of the images. This suggests that you passed in a guess for the cx,cy and used the parameter CALIB_FIX_PRINCIPAL_POINT (or just overrode the calibrated values with the numerical center of the image). I didn’t see code for how you called calibrateCamera, so I’m just guessing, but I would suggest you calibrate the intrinsics fully (including cx,cy) because having accurate cx,cy values is very important. As an example, for cameras that I use regularly, an error of just 5 pixels for cx,cy is enough to make the camera unusable until it is re-calibrated.

My best advice at this point is:

  1. Abandon the idea of cropping images until you can get things working with the full size images - it just complicates things and obscures the source of error.
  2. Do everything you can reasonably do to make sure the optics stay fixed throughout the whole process. If either camera has autofocus, disable it / use a fixed absolute focus value always, and consider mechanically fixing the lens in position (epoxy) or changing to a manual / lockable lens if possible. If either camera has manual focus or zoom, lock them down mechanically.
  3. Fully calibrate both cameras, including cx,cy - thes are critically important parameters.
  4. Use CALIB_FIX_ASPECT_RATIO when calibrating intrinsics.
  5. Make sure you are being consistent with distorted / undistorted image and parameters. Prefer passing in distorted images with the distortion coefficients. When calling functions and passing undistorted images, make sure you aren’t passing in distortion coefficients.
  6. For debugging, check each step by using the calibration results to project known 3D points to the images and visually check to make sure they are accurate.

I hope this is helpful.

Thank you for advices - using CALIB_FIX_ASPECT_RATIO is probably a good idea, although the difference in focal length is miniscule - 0.001 from the value. I do use (cx,cy) from the calibration in every step. As I crop the big undistorted images not symmetrically wrt the original distortion center and some distortions are still present albeit very small, the (cx.cy) for the cropped images shift away from the image center (w/2,h/2). Typically, the distortion center is pretty close to the numerical center.
I did not include camera matrices in my post - how could you notice that fx and fy are different? Anyway, as I said, the difference is negligible for my purpose.
The experiment with the cropped images was done out of sheer curiosity. And the result happened to be uninterpretable (by me).
Stereo calibration of original images was checked using triangulation of corners from a planar grid. For all pairs of images triangulated points lie on a fairly flat surfaces, despite use of DLT which is not the most accurate algorithm.

Oh, my mistake. I thought you were the original poster (who did post intrinsics for the cameras), so disregard my comments about fx,fy and cx,cy.