I’m getting obscenely high 3D point values that were estimated in SFM module.
To start, I am attempting to estimate the 3D reconstruction of the 2D track points only in this tunnel video that is about 5 seconds long. The video’s resolution is 1920x1080 pixels and has a framerate of 60 fps.
The 2D track points in the above image are user-defined points and tracked using Lucas-Kanade method. No camera information came with this shot. I am assuming this video is taken using a monocular camera setup.
Referencing the tutorial, I processed my tracked 2D points in the following manner:
std::vector<cv::Mat_<double>> pointMatList;
int numTracks = (int)track2DList.size();
int numFrames = 300;
for(int i = 0; i < numFrames; ++i)
{
cv::Mat_<double> frame(2, numTracks);
for(int j = 0; j < numTracks; ++j)
{
frame(0, j) = track2DList[j][i][0];
frame(1, j) = track2DList[j][i][1];
}
pointMatList.push_back(frame);
}
For the next step, I created a camera matrix K with what I am assuming is to be the initial guess. Precalibration of camera should not, in theory, be needed since the SFM module will estimate/refine the camera’s intrinsics once you invoke the run() or reconstruct() routines. In addition, since I have no idea what camera was used for this shot, I had to make some assumptions on what the focal length in pixels should be.
Code for initial guess of Camera K below:
// Using a guess of 30-degrees for FOV
double initFocal = (_imageSize.width * 0.5) / tan(30 * 0.5 * (M_PI/180.0));
cv::Matx33d K = cv::Matx33d(initFocal, 0, _imageSize.width/2.,
0, initFocal, _imageSize.height/2.,
0, 0, 1);
Below, I then set up the following properties for the SFM pipeline and running it:
int keyframe1 = 0;
int keyframe2 = 299;
int select_keyframes = 1;
const double k1 = 0;
const double k2 = 0;
const int verbosity = 1;
int refine_intrinsics =
cv::sfm::SFM_REFINE_FOCAL_LENGTH | cv::sfm::SFM_REFINE_PRINCIPAL_POINT |
cv::sfm::SFM_REFINE_RADIAL_DISTORTION_K1 | cv::sfm::SFM_REFINE_RADIAL_DISTORTION_K2;
// Configuring reconstruction options
cv::sfm::libmv_ReconstructionOptions options(keyframe1, keyframe2, refine_intrinsics, select_keyframes, verbosity);
// Configuring initial camera intrinsics
cv::sfm::libmv_CameraIntrinsicsOptions camOptions(cv::sfm::SFM_DISTORTION_MODEL_POLYNOMIAL,
initFocal,
initFocal,
_imageSize.width/2.,
_imageSize.height/2.,
k1,
k2);
std::vector<cv::Mat> Rs_est;
std::vector<cv::Mat> Ts_est;
std::vector<cv::Mat_<double>> points3DEstList;
cv::Ptr<cv::sfm::BaseSFM> sfmObj = cv::sfm::SFMLibmvEuclideanReconstruction::create(camOptions, options);
// Running the reconstruction routine finally...
sfmObj->run(pointMatList, K, Rs_est, Ts_est, points3DEstList);
Upon extracting the contents in points3DEstList
, I get some obscenely large values for the 3D positions of the 2D points. See the log below:
Original intrinsics: f=3582.77 cx=960 cy=540 w=1920 h=1080
Final intrinsics: f=3582.77 cx=960 cy=540 w=1920 h=1080 k1=5.6255e-12 k2=-1.11928e-09
Skipped 0 markers.
Reprojected 7800 markers.
Total error: 2.08411e-08
Average error: 2.67193e-12 [pixels].
-- Reconstructed 3D Points:
[
[ -397065034781.862366, -603793572962.432861, 10138045231494.660156],
[ -349920118787.534668, -526880176894.646667, 10144148487534.708984],
[ -305747836170.306641, -448685804277.752991, 10280912331627.828125],
[ 254597692276.514923, -421369493753.833679, 10037951348921.263672],
[ 283441495883.048035, -475484759494.170776, 9871476424112.546875],
[ 330541746370.687256, -582449140603.217285, 10089272406914.730469],
[ -199303184288.419708, -609771406686.366089, 10253631210867.880859],
[ 218634077882.103668, -619596055520.065674, 9779117081479.052734],
[ 192011968990.684753, -525239898537.542786, 10056182362791.714844],
[ -180972569526.857147, -478788694755.319580, 10155172883942.697266],
[ -111343811122.347504, 350259040943.077454, 10141705796722.130859],
[ 427378239.279521, 355859768222.684448, 10134145755062.722656],
[ 136181407490.486130, 349124303694.458984, 10128875211873.966797],
[ 255620375947.170013, 321927183042.141846, 10101467051070.068359],
[ 376595641930.277954, 359228424744.958252, 9961719359665.589844],
[ 452307044682.328979, 420915947370.653870, 10092332970681.763672],
[ 547344473075.463257, 487706757248.445312, 10089038909142.201172],
[ -578555531319.279785, 424974658088.160645, 10112222523170.773438],
[ -423484231093.825867, 456207438176.313171, 10144216920645.845703],
[ -401193858380.593872, 309267640100.370972, 10216747380118.269531],
[ -326494133679.072144, 342591403747.613953, 10188685478560.978516],
[ -320631261694.793701, 250194576078.970764, 10158513659603.068359],
[ -148806978507.747040, 631636630959.099365, 10178551130686.199219],
[ -39497276007.713699, 629822741799.091553, 10138166160031.388672],
[ 51115921539.170876, 633243295034.075806, 10136859945807.154297],
[ 135085196262.745728, 629711521038.763916, 10109250092209.322266],
]
I don’t think the reconstructed 3D track points shouldn’t be this big. If someone can shed some light on this, it would be appreciated. Thank you for reading.