SFM - Estimated 3D points from reconstruction are obscenely high

I’m getting obscenely high 3D point values that were estimated in SFM module.

To start, I am attempting to estimate the 3D reconstruction of the 2D track points only in this tunnel video that is about 5 seconds long. The video’s resolution is 1920x1080 pixels and has a framerate of 60 fps.

tunnel_gif_optim

The 2D track points in the above image are user-defined points and tracked using Lucas-Kanade method. No camera information came with this shot. I am assuming this video is taken using a monocular camera setup.

Referencing the tutorial, I processed my tracked 2D points in the following manner:

std::vector<cv::Mat_<double>> pointMatList;
  
int numTracks = (int)track2DList.size();
int numFrames = 300;
  
for(int i = 0; i < numFrames; ++i)
{
    cv::Mat_<double> frame(2, numTracks);
    
    for(int j = 0; j < numTracks; ++j)
    {
        frame(0, j) = track2DList[j][i][0];
        frame(1, j) = track2DList[j][i][1];
    }
    
    pointMatList.push_back(frame);
}

For the next step, I created a camera matrix K with what I am assuming is to be the initial guess. Precalibration of camera should not, in theory, be needed since the SFM module will estimate/refine the camera’s intrinsics once you invoke the run() or reconstruct() routines. In addition, since I have no idea what camera was used for this shot, I had to make some assumptions on what the focal length in pixels should be.

Code for initial guess of Camera K below:

// Using a guess of 30-degrees for FOV
double initFocal = (_imageSize.width * 0.5) / tan(30 * 0.5 * (M_PI/180.0));
cv::Matx33d K = cv::Matx33d(initFocal,  0,  _imageSize.width/2.,
                            0,          initFocal,  _imageSize.height/2.,
                            0,          0,          1);

Below, I then set up the following properties for the SFM pipeline and running it:

int keyframe1 = 0;
int keyframe2 = 299;
int select_keyframes = 1;

const double k1 = 0;
const double k2 = 0;
const int verbosity = 1;
int refine_intrinsics =
cv::sfm::SFM_REFINE_FOCAL_LENGTH | cv::sfm::SFM_REFINE_PRINCIPAL_POINT |
cv::sfm::SFM_REFINE_RADIAL_DISTORTION_K1 | cv::sfm::SFM_REFINE_RADIAL_DISTORTION_K2;

// Configuring reconstruction options
cv::sfm::libmv_ReconstructionOptions options(keyframe1, keyframe2, refine_intrinsics, select_keyframes, verbosity);

// Configuring initial camera intrinsics
cv::sfm::libmv_CameraIntrinsicsOptions camOptions(cv::sfm::SFM_DISTORTION_MODEL_POLYNOMIAL,
                                                  initFocal,
                                                  initFocal,
                                                  _imageSize.width/2.,
                                                  _imageSize.height/2.,
                                                  k1,
                                                  k2);

std::vector<cv::Mat> Rs_est;
std::vector<cv::Mat> Ts_est;
std::vector<cv::Mat_<double>> points3DEstList;
            
cv::Ptr<cv::sfm::BaseSFM> sfmObj = cv::sfm::SFMLibmvEuclideanReconstruction::create(camOptions, options);

// Running the reconstruction routine finally...
sfmObj->run(pointMatList, K, Rs_est, Ts_est, points3DEstList);

Upon extracting the contents in points3DEstList, I get some obscenely large values for the 3D positions of the 2D points. See the log below:

Original intrinsics: f=3582.77 cx=960 cy=540 w=1920 h=1080
Final intrinsics: f=3582.77 cx=960 cy=540 w=1920 h=1080 k1=5.6255e-12 k2=-1.11928e-09

Skipped 0 markers.
Reprojected 7800 markers.
Total error: 2.08411e-08
Average error: 2.67193e-12 [pixels].

-- Reconstructed 3D Points:

[
[ -397065034781.862366, -603793572962.432861, 10138045231494.660156],
[ -349920118787.534668, -526880176894.646667, 10144148487534.708984],
[ -305747836170.306641, -448685804277.752991, 10280912331627.828125],
[  254597692276.514923, -421369493753.833679, 10037951348921.263672],
[  283441495883.048035, -475484759494.170776,  9871476424112.546875],
[  330541746370.687256, -582449140603.217285, 10089272406914.730469],
[ -199303184288.419708, -609771406686.366089, 10253631210867.880859],
[  218634077882.103668, -619596055520.065674,  9779117081479.052734],
[  192011968990.684753, -525239898537.542786, 10056182362791.714844],
[ -180972569526.857147, -478788694755.319580, 10155172883942.697266],
[ -111343811122.347504,  350259040943.077454, 10141705796722.130859],
[     427378239.279521,  355859768222.684448, 10134145755062.722656],
[  136181407490.486130,  349124303694.458984, 10128875211873.966797],
[  255620375947.170013,  321927183042.141846, 10101467051070.068359],
[  376595641930.277954,  359228424744.958252,  9961719359665.589844],
[  452307044682.328979,  420915947370.653870, 10092332970681.763672],
[  547344473075.463257,  487706757248.445312, 10089038909142.201172],
[ -578555531319.279785,  424974658088.160645, 10112222523170.773438],
[ -423484231093.825867,  456207438176.313171, 10144216920645.845703],
[ -401193858380.593872,  309267640100.370972, 10216747380118.269531],
[ -326494133679.072144,  342591403747.613953, 10188685478560.978516],
[ -320631261694.793701,  250194576078.970764, 10158513659603.068359],
[ -148806978507.747040,  631636630959.099365, 10178551130686.199219],
[  -39497276007.713699,  629822741799.091553, 10138166160031.388672],
[   51115921539.170876,  633243295034.075806, 10136859945807.154297],
[  135085196262.745728,  629711521038.763916, 10109250092209.322266],
]

I don’t think the reconstructed 3D track points shouldn’t be this big. If someone can shed some light on this, it would be appreciated. Thank you for reading.

do they make sense if you perform Venezuelan Adjustment (moving the decimal point)?

I’m gonna edit your post so these numbers are more machine-readable.

here’s the data normalized (mean subtracted, scaled uniformly to have unity stddev):

>>> b = a - a.mean(axis=0); b / b.std()
array([[-1.12289, -1.99421,  0.08625],
       [-0.98208, -1.76448,  0.10448],
       [-0.85014, -1.53092,  0.51297],
       [ 0.82352, -1.44933, -0.21272],
       [ 0.90968, -1.61097, -0.70995],
       [ 1.05036, -1.93045, -0.05943],
       [-0.53221, -2.01206,  0.43148],
       [ 0.71611, -2.0414 , -0.98581],
       [ 0.63659, -1.75958, -0.15826],
       [-0.47746, -1.62084,  0.1374 ],
       [-0.26949,  0.8554 ,  0.09718],
       [ 0.06436,  0.87213,  0.0746 ],
       [ 0.46983,  0.85201,  0.05886],
       [ 0.82658,  0.77078, -0.02301],
       [ 1.18791,  0.88219, -0.44041],
       [ 1.41405,  1.06644, -0.05029],
       [ 1.69791,  1.26593, -0.06013],
       [-1.66497,  1.07856,  0.00912],
       [-1.2018 ,  1.17185,  0.10468],
       [-1.13522,  0.73297,  0.32132],
       [-0.91211,  0.8325 ,  0.2375 ],
       [-0.89459,  0.55652,  0.14738],
       [-0.38138,  1.69583,  0.20723],
       [-0.05489,  1.69041,  0.08661],
       [ 0.21576,  1.70063,  0.08271],
       [ 0.46656,  1.69008,  0.00024]])

I’m sorry, I have no idea what Venezuelan Adjustment is.

in SFM, you have an ambiguity of scale. did you notice that nowhere in the entire process did you have to give a known size measurement for anything you can see in the video?

as for “moving the decimal point”…

Thank you @crackwitz. I was running under the assumption that the SFM module’s reconstruction pipeline is already handling the scale ambiguity since nowhere did I see in the referenced tutorial that compensates for this once they retrieve their 3D points. I’m gonna have to download more test footage of varying resolutions and motions to verify if this behavior of the estimated 3D points being so inflated is carried all throughout.

the tutorial code doesn’t care. it’s displaying the data. the visualization does its own translation and scaling, to bring the data into view. the user notices none of that.

SFM’s scale ambiguity is a fact of the subject matter. do not expect a mere tutorial to teach you anything except how to copy code and run it, and how to use OpenCV’s APIs. to learn this stuff even superficially, you need a university-level course or book.

crosspost: c++ - OpenCV SFM - Estimated 3D points from reconstruction are obscenely high - Stack Overflow

I have been diligently reading my copy of Multiple View Geometry in Computer Vision by R. Hartley and A. Zisserman for a while now, though half the material there was honestly going over my head. :grimacing: I was hoping that by learning to use the SFM module, it will expedite the process. Regardless, thank you for taking the time in answering my thread. Your point about addressing the scale ambiguity has been really helpful.