StereoCalibrate and StereoRectify with different resolutions

I have an optical (1440x1080) camera and infrared (640x512) camera that I want to use as a stereo pair. For calibration I have obtained the intrinsic parameters separetely with very low reprojection error (0.10 or less). Next, I want to run cv2.stereocalibrate and cv2.stereorectify to obtain a disparity (Q matrix) for the stereo pair.

I have obtained a synchronized video with the cameras both seeing several different poses of the chessboard. However, I want to make sure that my understanding is correct.

As I have two different resolutions, should I scale down the optical camera’s (1440x1080) images AND camera matrix (with cv2.getOptimalNewCameraMatrix) to a new size of (640x512) and then run chessboard detection and pass the corners to stereocalibrate and stereorectify using the new scaled camera matrix?

If not, what should the process be to account for the change in resolution? So far I am not getting great results after the stereo calibration procedure, but I’d like to verify my process first.

no. no scaling. do not resample anything.

each camera has its own camera matrix. that expresses each camera’s properties.

cameras need not be identical.

there is nothing for you to “do”.

do not “do” anything.

if you “do” anything, it will be detrimental to the entire process.

show your inputs and outputs.

Ok, just wanted to check if that was the wrong way to go about it.

So, now without any rescaling (other than for cv2.imshow) as mentioned before:

The IR camera intrinsic parameters:

<?xml version="1.0"?>
<opencv_storage>
<camera_name>ir</camera_name>
<average_error>5.1821870542833971e-02</average_error>
<standard_deviation_error>7.7188103307330247e-03</standard_deviation_error>
<frames_used>265</frames_used>
<camera_matrix type_id="opencv-matrix">
  <rows>3</rows>
  <cols>3</cols>
  <dt>d</dt>
  <data>
    7.4802115340833984e+02 0. 3.1950000000000000e+02 0.
    7.5288404670723958e+02 2.5550000000000000e+02 0. 0. 1.</data></camera_matrix>
<distortion_coef type_id="opencv-matrix">
  <rows>1</rows>
  <cols>5</cols>
  <dt>d</dt>
  <data>
    -4.8196856529060433e-01 2.1818770267770027e-01
    -8.7580857567709346e-03 -2.7249149244124946e-03
    -1.2731547345719221e-01</data></distortion_coef>
<frame_size type_id="opencv-matrix">
  <rows>2</rows>
  <cols>1</cols>
  <dt>d</dt>
  <data>
    640. 512.</data></frame_size>

The optical camera intrinsic parameters:

<?xml version="1.0"?>
<opencv_storage>
<camera_name>opt</camera_name>
<average_error>9.2124809144888783e-02</average_error>
<standard_deviation_error>1.8241452508158749e-02</standard_deviation_error>
<frames_used>282</frames_used>
<camera_matrix type_id="opencv-matrix">
  <rows>3</rows>
  <cols>3</cols>
  <dt>d</dt>
  <data>
    1.4349869566047616e+03 0. 7.1950000000000000e+02 0.
    1.4275809764959108e+03 5.3950000000000000e+02 0. 0. 1.</data></camera_matrix>
<distortion_coef type_id="opencv-matrix">
  <rows>1</rows>
  <cols>5</cols>
  <dt>d</dt>
  <data>
    -3.6871797553935726e-01 1.8287793665147228e-01
    1.1779825314816670e-03 -8.8811739018050304e-04
    1.1101349450350156e-02</data></distortion_coef>
<frame_size type_id="opencv-matrix">
  <rows>2</rows>
  <cols>1</cols>
  <dt>d</dt>
  <data>
    1440. 1080.</data></frame_size>

Detect corners method:

    def find_corners_from_videos(self, video_paths: list, frame_increment: int = 0, visualize: bool = True):

        print("Starting checkerboard detection...\n")

        # Load the synchronized videos
        kernels = []
        calibration_videos: list[cv2.VideoCapture] = []
        for idx, video_path in enumerate(video_paths):
            self.cameras[idx].video_path = video_path
            calibration_videos.append(cv2.VideoCapture(str(self.cameras[idx].video_path)))

            if not calibration_videos[idx].isOpened():
                print("Error: Video not found/opened.")
                sys.exit()

            frame_W = calibration_videos[idx].get(cv2.CAP_PROP_FRAME_WIDTH)
            frame_H = calibration_videos[idx].get(cv2.CAP_PROP_FRAME_HEIGHT)

            if self.cameras[idx].frame_size[0] != int(frame_W) or self.cameras[idx].frame_size[1] != int(frame_H):
                print('Error: The intrinsic calibration Frame W/H do not match the provided video\n')
                print('Intrinsic Video Frame Size: ', self.cameras[idx].frame_size)
                print('Extrinsic Video Frame Size: ', (int(frame_W), int(frame_H)))
                sys.exit()

            min_dimension = min(frame_H, frame_W)
            alpha = int(round(0.0125 * min_dimension))
            if alpha < 7:
                alpha = 7
            kernels.append((alpha, alpha))

        # object_points --> (0, 0, 0), (1, 0, 0), ... (
        object_points = np.zeros((self.checkerboard[0] * self.checkerboard[1], 3), np.float32)
        object_points[:, :2] = np.mgrid[0:self.checkerboard[0], 0:self.checkerboard[1]].T.reshape(-1, 2)
        object_points = object_points * self.square_size  # square_size = X mm --> Y m

        frame_position = 0
        frames_processed = 1
        while calibration_videos[0].isOpened() and calibration_videos[1].isOpened():
            retP, frameP = calibration_videos[0].read()
            retS, frameS = calibration_videos[1].read()

            if not retP or not retS:
                # No frame from VideoCapture object --> Exit the loop
                break

            grayP = cv2.cvtColor(frameP, cv2.COLOR_BGR2GRAY)
            grayS = cv2.cvtColor(frameS, cv2.COLOR_BGR2GRAY)

            # Find image points
            retP, cornersP = cv2.findChessboardCorners(grayP, self.checkerboard,
                                                       flags=cv2.CALIB_CB_ADAPTIVE_THRESH + cv2.CALIB_CB_NORMALIZE_IMAGE + cv2.CALIB_CB_FILTER_QUADS)
            retS, cornersS = cv2.findChessboardCorners(grayS, self.checkerboard,
                                                       flags=cv2.CALIB_CB_ADAPTIVE_THRESH + cv2.CALIB_CB_NORMALIZE_IMAGE + cv2.CALIB_CB_FILTER_QUADS)

            if retP and retS:  # Detection is needed in both images to estimate the mapping
                # If image points, remember idx and refine points
                cornersP = cv2.cornerSubPix(grayP, cornersP, kernels[0], (-1, -1), criteria=self.termination_criteria)
                cornersS = cv2.cornerSubPix(grayS, cornersS, kernels[1], (-1, -1), criteria=self.termination_criteria)

                if visualize:
                    print("Enter to continue, s to skip a bad frame, ESCAPE to stop visualization.\n")
                    frameP = cv2.drawChessboardCorners(frameP, self.checkerboard, cornersP, retP)
                    frameS = cv2.drawChessboardCorners(frameS, self.checkerboard, cornersS, retS)

                    frameP_scaled = cv2.resize(frameP, (frameS.shape[1], frameS.shape[0]))
                    frame = np.hstack((frameP_scaled, frameS))

                    cv2.namedWindow("Detected Corners (Left: P, Right: S)",
                                    cv2.WINDOW_NORMAL)  # Create window with freedom of dimensions
                    cv2.imshow("Detected Corners (Left: P, Right: S)", frame)
                    key = cv2.waitKey(0) & 0xFF
                    if key == 27:  # Escape key
                        cv2.destroyAllWindows()
                        visualize = False
                    elif key == 115:  # s key
                        print('Skipped\n')

                        # Increment the video frame position if specified by user
                        if frame_increment not in range(0, 2):
                            frame_position += frame_increment
                            calibration_videos[0].set(cv2.CAP_PROP_POS_FRAMES, frame_position)
                            calibration_videos[1].set(cv2.CAP_PROP_POS_FRAMES, frame_position)

                        frames_processed += 1
                        if frames_processed % 25 == 0:
                            print(f"Frame Number: {frames_processed} -|- Good Frames: {len(self.camera_points)}")
                        continue

                # Do the Primary Camera first
                self.cameras[0].frame_indices.append(calibration_videos[0].get(cv2.CAP_PROP_POS_FRAMES))
                self.cameras[0].world_points.append(object_points)
                self.cameras[0].camera_points.append(cornersP)

                # Do the Secondary Camera next
                self.cameras[1].frame_indices.append(calibration_videos[1].get(cv2.CAP_PROP_POS_FRAMES))
                self.cameras[1].world_points.append(object_points)
                self.cameras[1].camera_points.append(cornersS)

            # Increment the video frame position if specified by user
            if frame_increment not in range(0, 2):
                frame_position += frame_increment
                calibration_videos[0].set(cv2.CAP_PROP_POS_FRAMES, frame_position)
                calibration_videos[1].set(cv2.CAP_PROP_POS_FRAMES, frame_position)

            frames_processed += 1
            if frames_processed % 25 == 0:
                print(f"Frame Number: {frames_processed} -|- Good Frames: {len(self.cameras[0].camera_points)}")

        cv2.destroyAllWindows()

        print("Getting calibration coverage...")
        canvasP = np.zeros((1080, 1440, 3), dtype=np.uint8)
        canvasS = np.zeros((512, 640, 3), dtype=np.uint8)

        for idx in range(len(self.cameras[0].world_points)):
            canvasP = cv2.drawChessboardCorners(canvasP, self.checkerboard, self.cameras[0].camera_points[idx], True)
            canvasS = cv2.drawChessboardCorners(canvasS, self.checkerboard, self.cameras[1].camera_points[idx], True)

        canvasP_scaled = cv2.resize(canvasP, (canvasS.shape[1], canvasS.shape[0]))
        canvas = np.hstack((canvasP_scaled, canvasS))
        cv2.imshow("Calibration Coverage | Primary (L) Secondary (R)", canvas)
        cv2.waitKey(0)
        cv2.destroyAllWindows()

Stereo Calibration Method:

    def stereo_calibrate(self):
        """
        Performs stereo calibration between two cameras
        """

        print("Starting stereo calibration...\n")

        identity_R = np.identity(3, dtype=np.float64)
        identity_T = np.array([[0.], [0.], [0.]], dtype=np.float64)

        ret, self.cameras[0].camera_matrix, self.cameras[0].distortion_coef, \
        self.cameras[1].camera_matrix, self.cameras[1].distortion_coef, \
        self.rotation_matrix, self.translation_matrix, self.essential_matrix, \
        self.fundamental_matrix, per_view_errors = cv2.stereoCalibrateExtended(
            objectPoints=self.cameras[0].world_points,
            imagePoints1=self.cameras[0].camera_points,
            imagePoints2=self.cameras[1].camera_points,
            cameraMatrix1=self.cameras[0].camera_matrix,
            distCoeffs1=self.cameras[0].distortion_coef,
            cameraMatrix2=self.cameras[1].camera_matrix,
            distCoeffs2=self.cameras[1].distortion_coef,
            imageSize=self.cameras[0].frame_size,
            R=identity_R,
            T=identity_T,
            flags=self.calibration_flags,
            criteria=self.termination_criteria)

        self.rep_error_AVG = np.array(np.nanmean(per_view_errors))
        self.rep_error_SD = np.array(np.nanstd(per_view_errors))
        print(f"Stereo reprojection error: {self.rep_error_AVG} +/= {self.rep_error_SD}\n")

        print('Done Stereo Calibration\n')

Stereo Rectify Method:

    def stereo_rectification(self):

        print("...getting Rectification and Projection matrices...")

        self.cameras[0].rectification_matrix, self.cameras[1].rectification_matrix, \
        self.cameras[0].projection_matrix, self.cameras[1].projection_matrix, \
        self.disparity_matrix, self.cameras[0].sr_roi, self.cameras[1].sr_roi = cv2.stereoRectify(
            cameraMatrix1=self.cameras[0].camera_matrix,
            distCoeffs1=self.cameras[0].distortion_coef,
            cameraMatrix2=self.cameras[1].camera_matrix,
            distCoeffs2=self.cameras[1].distortion_coef,
            imageSize=self.cameras[0].frame_size,
            R=self.rotation_matrix,
            T=self.translation_matrix)

        self.cameras[0].mapX, self.cameras[0].mapY = cv2.initUndistortRectifyMap(
            cameraMatrix=self.cameras[0].camera_matrix,
            distCoeffs=self.cameras[0].distortion_coef,
            R=self.cameras[0].rectification_matrix,
            newCameraMatrix=self.cameras[0].camera_matrix,
            size=self.cameras[0].frame_size,
            m1type=cv2.CV_32FC1)

        self.cameras[1].mapX, self.cameras[1].mapY = cv2.initUndistortRectifyMap(
            cameraMatrix=self.cameras[1].camera_matrix,
            distCoeffs=self.cameras[1].distortion_coef,
            R=self.cameras[1].rectification_matrix,
            newCameraMatrix=self.cameras[1].camera_matrix,
            size=self.cameras[1].frame_size,
            m1type=cv2.CV_32FC1)

        print('Done Stereo Rectification\n')

Visualize Rectified Frames:

camera_array = CameraArray()

    camera_array.adk.load_intrinsic_xml(file_path='../../calibrations/intrinsic/ir_intrinsic_13:46:31.xml')
    camera_array.adk.load_extrinsic_xml(file_path='../../calibrations/extrinsic/secondary_ir_extrinsic_10:28:15.xml')

    camera_array.bfs.load_intrinsic_xml(file_path='../../calibrations/intrinsic/opt_intrinsic_13:44:51.xml')
    camera_array.bfs.load_extrinsic_xml(file_path='../../calibrations/extrinsic/primary_opt_extrinsic_10:28:15.xml')

    videoP = cv2.VideoCapture(
        '../../data_out/video_out/calib_test/optical_11:42:24.mp4')
    videoS = cv2.VideoCapture(
        '../../data_out/video_out/calib_test/infrared_11:42:24.mp4')

    while videoP.isOpened() and videoS.isOpened():
        retO, frameP = videoP.read()
        retS, frameS = videoS.read()

        if not retO or not retS:
            # No frame from VideoCapture object --> Exit the loop
            break

        frameP_map = cv2.remap(frameP, camera_array.bfs.mapX, camera_array.bfs.mapY, cv2.INTER_CUBIC)
        frameS_map = cv2.remap(frameS, camera_array.adk.mapX, camera_array.adk.mapY, cv2.INTER_CUBIC)

        frameP_map = cv2.rectangle(frameP_map,
                                   pt1=(int(camera_array.bfs.sr_roi[0][0]),
                                        int(camera_array.bfs.sr_roi[1][0])),
                                   pt2=(int(camera_array.bfs.sr_roi[2][0]),
                                        int(camera_array.bfs.sr_roi[3][0])),
                                   color=(0, 255, 0))
        frameS_map = cv2.rectangle(frameS_map,
                                   pt1=(int(camera_array.adk.sr_roi[0][0]),
                                        int(camera_array.adk.sr_roi[1][0])),
                                   pt2=(int(camera_array.adk.sr_roi[2][0]),
                                        int(camera_array.adk.sr_roi[3][0])),
                                   color=(0, 255, 0))

        frameP_map_scaled = cv2.resize(frameP_map, (frameS_map.shape[1], frameS_map.shape[0]))
        frame_map = np.hstack((frameP_map_scaled, frameS_map))

        for p in range(20):
            frame_map = cv2.line(frame_map, pt1=(0, p * 25), pt2=(1280, p * 25), color=(0, 0, 255))

        cv2.namedWindow("Mapped Frames")  # Create window with freedom of dimensions
        cv2.imshow("Mapped Frames", frame_map)
        key = cv2.waitKey(10)

The IR camera extrinsic parameters:

<?xml version="1.0"?>
<opencv_storage>
<camera_primary>opt</camera_primary>
<camera_secondary>ir</camera_secondary>
<average_error type_id="opencv-matrix">
  <rows>1</rows>
  <cols>1</cols>
  <dt>d</dt>
  <data>
    3.3209568606265201e+00</data></average_error>
<standard_deviation_error type_id="opencv-matrix">
  <rows>1</rows>
  <cols>1</cols>
  <dt>d</dt>
  <data>
    2.4494093491827220e+00</data></standard_deviation_error>
<frames_used>262</frames_used>
<rotation_matrix type_id="opencv-matrix">
  <rows>3</rows>
  <cols>3</cols>
  <dt>d</dt>
  <data>
    1. 0. 0. 0. 1. 0. 0. 0. 1.</data></rotation_matrix>
<translation_matrix type_id="opencv-matrix">
  <rows>3</rows>
  <cols>1</cols>
  <dt>d</dt>
  <data>
    0. 0. 0.</data></translation_matrix>
<essential_matrix type_id="opencv-matrix">
  <rows>3</rows>
  <cols>3</cols>
  <dt>d</dt>
  <data>
    1. 0. 0. 0. 1. 0. 0. 0. 1.</data></essential_matrix>
<fundamental_matrix type_id="opencv-matrix">
  <rows>3</rows>
  <cols>3</cols>
  <dt>d</dt>
  <data>
    1. 0. 0. 0. 1. 0. 0. 0. 1.</data></fundamental_matrix>
<rectification_matrix type_id="opencv-matrix">
  <rows>3</rows>
  <cols>3</cols>
  <dt>d</dt>
  <data>
    9.9636819573698854e-01 6.0310027005657317e-02 6.0109226965541662e-02
    -6.1270304258908226e-02 9.9801932715418740e-01
    1.4260871036640765e-02 -5.9130096734564250e-02
    -1.7891988969362123e-02 9.9808993001176072e-01</data></rectification_matrix>
<projection_matrix type_id="opencv-matrix">
  <rows>3</rows>
  <cols>4</cols>
  <dt>d</dt>
  <data>
    1.0902325116015752e+03 0. 3.2922803878784180e+02
    -1.4730195692977671e+02 0. 1.0902325116015752e+03
    3.5091096496582031e+02 0. 0. 0. 1. 0.</data></projection_matrix>
<disparity_matrix type_id="opencv-matrix">
  <rows>4</rows>
  <cols>4</cols>
  <dt>d</dt>
  <data>
    1. 0. 0. -3.2922803878784180e+02 0. 1. 0. -3.5091096496582031e+02 0.
    0. 0. 1.0902325116015752e+03 0. 0. 7.4013443835055224e+00 0.</data></disparity_matrix>
<sr_roi type_id="opencv-matrix">
  <rows>4</rows>
  <cols>1</cols>
  <dt>d</dt>
  <data>
    5. 0. 1435. 1080.</data></sr_roi>

The optical camera extrinsic parameters:

<?xml version="1.0"?>
<opencv_storage>
<camera_primary>opt</camera_primary>
<camera_secondary>ir</camera_secondary>
<average_error type_id="opencv-matrix">
  <rows>1</rows>
  <cols>1</cols>
  <dt>d</dt>
  <data>
    3.3209568606265201e+00</data></average_error>
<standard_deviation_error type_id="opencv-matrix">
  <rows>1</rows>
  <cols>1</cols>
  <dt>d</dt>
  <data>
    2.4494093491827220e+00</data></standard_deviation_error>
<frames_used>262</frames_used>
<rotation_matrix type_id="opencv-matrix">
  <rows>3</rows>
  <cols>3</cols>
  <dt>d</dt>
  <data>
    9.9921373300565774e-01 -1.7778249680939894e-02
    -3.5437968496799643e-02 1.6606872421687648e-02
    9.9931476119504714e-01 -3.3078994030286100e-02
    3.6001771640681704e-02 3.2464471287363884e-02 9.9882427410569274e-01</data></rotation_matrix>
<translation_matrix type_id="opencv-matrix">
  <rows>3</rows>
  <cols>1</cols>
  <dt>d</dt>
  <data>
    -1.3461989391514784e-01 -8.1485232791034690e-03
    -8.1213930673863996e-03</data></translation_matrix>
<essential_matrix type_id="opencv-matrix">
  <rows>3</rows>
  <cols>3</cols>
  <dt>d</dt>
  <data>
    -1.5849033574659747e-04 7.8512904736774795e-03
    -8.4075903620775428e-03 -3.2684528050434857e-03
    4.5147478344253391e-03 1.3474942349165506e-01 5.9065009605264946e-03
    -1.3467251362130539e-01 4.1643235559166167e-03</data></essential_matrix>
<fundamental_matrix type_id="opencv-matrix">
  <rows>3</rows>
  <cols>3</cols>
  <dt>d</dt>
  <data>
    -1.3164758940922297e-08 6.5553875575179965e-07
    -1.3463339432451182e-03 -2.6973551304241571e-07
    3.7452137606934829e-07 1.5949738191570150e-02 4.4011360948314457e-04
    -8.7161840358698835e-03 1.</data></fundamental_matrix>
<rectification_matrix type_id="opencv-matrix">
  <rows>3</rows>
  <cols>3</cols>
  <dt>d</dt>
  <data>
    9.9875038389741089e-01 4.4506431949795755e-02 2.2734295235888397e-02
    -4.4134733178051395e-02 9.9888769598857219e-01
    -1.6598076091636053e-02 -2.3447728932170495e-02
    1.5573962814252985e-02 9.9960374933779828e-01</data></rectification_matrix>
<projection_matrix type_id="opencv-matrix">
  <rows>3</rows>
  <cols>4</cols>
  <dt>d</dt>
  <data>
    1.0902325116015752e+03 0. 3.2922803878784180e+02 0. 0.
    1.0902325116015752e+03 3.5091096496582031e+02 0. 0. 0. 1. 0.</data></projection_matrix>
<disparity_matrix type_id="opencv-matrix">
  <rows>4</rows>
  <cols>4</cols>
  <dt>d</dt>
  <data>
    1. 0. 0. -3.2922803878784180e+02 0. 1. 0. -3.5091096496582031e+02 0.
    0. 0. 1.0902325116015752e+03 0. 0. 7.4013443835055224e+00 0.</data></disparity_matrix>
<sr_roi type_id="opencv-matrix">
  <rows>4</rows>
  <cols>1</cols>
  <dt>d</dt>
  <data>
    0. 0. 967. 764.</data></sr_roi>



so your cameras are about 134.6 mm apart? says the calibration.

in the epiline picture it looks like the alignment didn’t succeed. various keypoints from the scene don’t match up.

always hold the calibration pattern so its plane is not parallel to any of the image planes of the cameras.

also don’t position your cameras too perfectly. that will hide errors in the code/math like swapped pictures/cameras/matrices/… position them sloppily, so issues become apparent, and only the correct calibration makes them disappear.