StereoRectify : how to compensate a 4 pixel shift?

Hi,
I’m currently working on a stereo video recording with a rather bad calibration : once undistorted/rectified, the right image is shifted 4 (more precisely 3.8) pixels down (ie on all left-right features matchings, the feature on the right image is 4 lines further down than on the left image).

Is there a simple way to modify the calibration data to compensate for this shift?

Thanks a lot in advance
Felix

PS : the relevant code :

std::vector<double> left_distortion_coeffs={0.1665669107651682, 0.575877741854227, 0.006379205470310661, -0.0017993439560661756};
std::vector<double> right_distortion_coeffs={0.16815689863584243, 0.5655897303367851, -1.5218607763842107e-05, 0.0035169653281324404};

//left intrinsic matrix
double l_fx=1757.703579674525;
double l_fy=1754.2733000318983;
double l_cx=1037.3544528564566;
double l_cy=754.8546950404783;

cv::Mat left_camera_intrinsics =(cv::Mat_<double>(3,3) << l_fx, 0   , l_cx,
                                                      0   , l_fy, l_cy,
                                                      0   , 0   ,  1.0);

//right intrinsic matrix
double r_fx=1757.1453747841483;
double r_fy=1752.4718777207768;
double r_cx=1023.6665754970669;
double r_cy=759.2776943745199;
cv::Mat right_camera_intrinsics=(cv::Mat_<double>(3,3) <<  r_fx, 0   , r_cx,
                                                        0   , r_fy, r_cy,
                                                        0   , 0   ,  1.0);

//transform between cameras (transformation from left frame (cnm1) to right frame(cn))
cv::Mat T_left_to_right=(cv::Mat_<double>(4,4)<<
          0.9993110889219513, 0.03710161562579457, 0.00090425415455681, -0.19492944401815318,
          -0.03711081358058976, 0.9992067184844753, 0.01444718833785609, 0.006302996103602742,
          -0.00036752279786606565, -0.014470793117122745, 0.9998952250478801, 0.0020179296721928395,
          0.0, 0.0, 0.0, 1.0);

double alpha=0; //all pixels in resulting image are valide (but we loose some pixels)
Mat new_left_intrinsic_matrix = cv::getOptimalNewCameraMatrix(left_camera_intrinsics,
                              left_distortion_coeffs,
                              im_size,
                              alpha);
Mat new_right_intrinsic_matrix = cv::getOptimalNewCameraMatrix(right_camera_intrinsics,
                              right_distortion_coeffs,
                              im_size,
                              alpha);

Mat R1, R2, P1, P2, Q;
cv::stereoRectify(new_left_intrinsic_matrix,
                  left_distortion_coeffs,
                  new_right_intrinsic_matrix,
                  right_distortion_coeffs,
                  im_size,
                  R_left_to_right,
                  Translation_left_to_right,
                  R1,
                  R2,
                  P1,
                  P2,
                  Q,
                  cv::CALIB_ZERO_DISPARITY,
                  alpha);

cv::Mat Left_Stereo_Map1, Left_Stereo_Map2;
cv::Mat Right_Stereo_Map1, Right_Stereo_Map2;

mapping_m1type=CV_32FC1;  
cv::initUndistortRectifyMap(new_left_intrinsic_matrix,
                            left_distortion_coeffs,
                            R1,
                            P1,
                            im_size,
                            mapping_m1type,
                            Left_Stereo_Map1,
                            Left_Stereo_Map2);

cv::initUndistortRectifyMap(new_right_intrinsic_matrix,
                            right_distortion_coeffs,
                            R2,
                            P2,
                            im_size,
                            mapping_m1type,
                            Right_Stereo_Map1,
                            Right_Stereo_Map2);

you could try messing around with the cy. or the last column of the T matrix. or its rotation part (top left 3x3).

Thanks, I will look into those.

If someone knows “mathematically” which one to modify and by which amount, I’m still interested

impossible to know or speculate sensibly without seeing your pictures, at all steps of the process, including untouched input images. these matrices only exist because it’s infeasible to align the cameras perfectly physically.

You say that all points are shifted by 3.8 pixels - if that’s really the case you probably can correct for it by changing parameters in your calibration data. My first thought is “why?” - why are the values so perfectly / uniformly shifted? How was the calibration obtained? What quality/scores did you get when you calibrated? Has anything physically changed since then?

I suspect that your positions aren’t all uniformly shifted by 3.8 pixels, and that there are other scaling/rotation/etc issues in the calibration. I think the best way forward is to calibrate the rig well, and use the calibration results directly. If you aren’t able to get good calibration, work on figuring out why. If you just fudge it to get something that seems better, can you trust the results? And when you inevitably run into another shift/error, will you tweak it again? It would be really helpful to have a reliable, accurate calibration process to go back to, and that shouldn’t be (too) hard to get sorted out.

Thanks for your answers.

For the question why this shift, I think it is because the calibration during survey somehow went bad, so the calibration was done after the survey (and after transporting the ROV). What’s more, it wasn’t performed with a single stereo camera, but with 2 distinct monocular cameras (with synchronized trigger). So I suspect one camera moved a little bit between the data acquisition and the a-posterior calibration.

The acquisition and calibration wasn’t done by my company but by some partner, so I don’t know the calibration scores.

I will of course use our own data and calibration once we get it, but it won’t be for the next couple of weeks for in-air data, and a few month for submarine data.

So until then, excepted if you know a public submarine stereo dataset, I will have to work with this record, so I would like to correct the bad calibration as good as possible. So if you have an idea how to correct a shift, I’m still interested.

Or maybe an alternative option could be to try to do calibration from the stereo video itself (15 minutes). However there are no chessboards/tags in the record, and features are not the easiest to work with (plants, shells, rocks, … rather than man made geometrical shapes with nice corners)

Understood. The use case description is helpful.

A few questions / comments that come to mind.

  1. Do you have confidence in the intrinsic calibration of the cameras? If not, are the cameras available to you, and are the optics fixed (no zoom, focus locked), and if not, can they be configured as they were when the data was captured? I ask because it would be nice to have good intrinsic calibration, so calibrating the cameras after the fact might be worth pursuing.
  2. You mention the data was captured underwater. The intrinsics will have to be calibrated underwater, too. I suppose it’s possible to adjust intrinsic parameters to account for the difference in underwater vs air calibration, but I don’t have experience doing this.
  3. It sounds like you suspect the error is due to either poorly calibrated extrinsics or a camera that moved after it was calibrated. I would want to try to calibrate the stereo rig based on the 15 minutes of data that you do have. There are many factors to consider (do the cameras relative pose change at all during the 15 minutes? Are there enough high quality features to work with?), but maybe you can get enough correspondences from your 15 minute sequence to get a better calibration than you have now.

So to summarize, I’d probably approach it as:
Trust the intrinsics, or try to calibrate them after the fact (accounting for the water effect)
Gather high quality correspondences from the video sequence (the correspondences can come from different frames throughout the sequence provided the cameras’ relative positions are fixed)
Compare your results to the current (4 pixel shift) results.

Have you tried just adjusting the CY (or CX, depending on your shift) of one of the cameras by 3.8 pixels? Maybe you can fudge it and get something better, but I’m skeptical.

Another thought: If you are able to estimate the shift being 3.8 pixels, that suggests you have some way of getting correspondences. Can the data you used to estimate a 3.8 pixel shift be used as input to a calibration process?

Interesting problem.

Thanks for your suggestions.

  1. For the intrinsic, I believe I can trust them (fixed optics) because it’s quite hard to modify them (compared to moving the 2 cameras (attached separately to the ROV) with respect to the other)

  2. I will double check, but I believe the calibration was done in water

  3. Yes, my main suspicion it that the camera moved with respect to each other between the acquisition and the calibration (other day, I think the ROV was retrieved from the see back to a ship, taken back to port, transported back to the university doing the tests, and I suppose put in some test pool to do the calibration).

For the 15 minutes video, I think the camera didn’t move (I haven’t noticed any change in the shift, at least nothing I noticed just looking at the values, so if there was a move, it should be <0.5 pixels). There are “no nice” features in the movies (it’s mainly small algae and rocks, with some turbidity and back-scattering of light on particles making the image a bit noisy) : so there are (nearly?) no features I could place to (sub) pixel accuracy by hand just by zooming the image (no sharp corners). On the other hand, there are hundreds (maybe even thousands) of “unsharp” features that could theoretically be matched between sucessive images. So maybe by averaging enough “not that good quality” features, it is still possible to get good results. Using goodFeaturesToMatch to find features and matching them with matchTemplate (TM_SQDIFF), I still manage to see this shift, so there should be enough information to correct it.

I will try playing with CY to see if I can improve things.

For the correspondance, for know it is goodFeaturesToMatch + matchTemplate. It should indeed be possible to use it to redo calibration (with a good amount of Ransac to eliminate the wrong matches)

EDIT : I tested to add 3.8 pixels to the CY of the right image : it solved the vertical shift (at least to a sub-pixel error that I no longer see “by hand” : I mainly get 0 shift, from time to time +1 or -1 pixels, in similar proportions)