Understanding extrinsic parameters dimensions

I’m learning stereo vision principles and trying to follow a tutorial
From camera calibration and 3d reconstruction tutorial:
image
It clearly shows the 3x4 [R|t] matrix, however when I look at the results that I get from running the calibration, I see

   extrinsic_parameters: !!opencv-matrix
   rows: 6
   cols: 6
   dt: d
   data: [ -4.9718549393950172e-02, 4.2817478218341143e-02,
       3.0913753044471628e+00, 6.0210377302743368e+01,
       2.8687654075865353e+01, 3.2274813234145904e+02,
       -2.1625258391757840e-01, 2.7522087409757982e-02,
       2.9934905599394401e+00, 3.1049613983689375e+02,
       -4.9865090266582804e+01, 6.0696119186638884e+02,
       -6.0534636789854754e-02, 7.9158290813564722e-02,
       3.0542890482926150e+00, -5.4252107270578541e+01,
       -4.8474594333297079e+00, 5.0464634537084254e+02,
       8.3481682705390048e-01, 3.6857143748893595e-01,
       2.9928554046392994e+00, 8.9940347819416431e+01,
       3.8558139273085608e+01, 3.2601583778763938e+02,
       -9.7268799590706700e-01, 5.5488999304730979e-01,
       2.7926458500566964e+00, 6.8899978633628621e+01,
       2.9218955536384442e+01, 5.2324882270561443e+02,
       2.3317399842268918e-01, -2.9775921477133793e-01,
       -3.0947983746483105e+00, 2.3329718007885967e+02,
       5.8152062747705592e+01, 4.3118344878395760e+02 ]

Could someone help me understand this?

Hi,
How do you get these results?

I’ve compiled a sample camera calibration c++ code (sample_code) and took 10 pictures of the chessboard pattern.
And here is the way I run it

   ./CameraCalibrator -op -oe -oo -w=9 -h=6 -pt=chessboard -s=25 -o="cam_config.yaml"  
   ./config/calib_imgs_list.xml

it produces the config file with lots of matrices that I seem to understand, besides the extrinsic one

It’s here in code
calibrateCameraRO is used in this program then extreinsics parameters are rvec and tvecresults

1 Like

Right, that’s not my question. The question is why 6x6 not 3x4 ?

Read calibrateCameraRO doc

is this the RTFM kind of answer. Not really helping TBH.

You are too funny :

rvecs Output vector of rotation vectors estimated for each pattern view. See calibrateCamera() for details.
tvecs Output vector of translation vectors estimated for each pattern view.

THAT is the answer. you just ran some sample code. sample code has no documentation. it might have a tutorial accompanying it, or it might not.

edit: this piece of sample code has one line of comment:

//cvWriteComment( *fs, "a set of 6-tuples (rotation vector + translation vector) for each view", 0 );

that sample code dumps a 6-column matrix in your lap.

please follow the link you were given and brood over the two dozen lines of code.

you are literally getting one rvec and tvec for every picture. and you seem to have 6 good pictures.

your only issue is converting an rvec into a 3x3 rotation matrix, and gluing the rvec onto that. and there’s a function for the first part, Rodrigues.

Alright, thanks for the answer. I was using 10 images, so for each view didn’t quite make sense to me. I didn’t realize it doesn’t use all images.

Now I am getting really confused, all my pictures were taken with a fixed camera, why am I getting different rotation matrices, from what I read extrinsic parameters should remain the same.

You can use this program tutorial is here
You can get in result file intrinsic an extrinsic parameters

“extrinsic” means a few things.

if you had a stereo camera, “extrinsic” describes the poses of the cameras to each other. that is supposed to be fixed.

if you talk about the poses of calibration patterns relative to a camera (or vice versa), those of course change when you move the pattern.

here’s your data. note the rvecs all have about a half turn of magnitude, and they all point mostly towards +z (rotation axis). that tells me your calibration target might be “upside down” (no issue) and it’s very much facing the camera, 4th and 5th row look like they have a bit of an (out of plane) angle to their rotation axis (x and y components).

the tvecs show a variety of distances (z) that should correspond with where you’ve been holding the target.

>>> angle = np.linalg.norm(a[:,0:3], axis=1)[:,np.newaxis] # length of rvec = angle
>>> # show first the normalized vectors, then the angles (last column)
>>> np.hstack([ a[:,0:3] / angle, angle / pi * 180])
array([[ -0.01608,   0.01385,   0.99977, 177.16265],
       [ -0.07205,   0.00917,   0.99736, 171.96857],
       [ -0.01981,   0.0259 ,   0.99947, 175.09099],
       [  0.26681,   0.1178 ,   0.95652, 179.27214],
       [ -0.32328,   0.18442,   0.92816, 172.39172],
       [  0.07479,  -0.0955 ,  -0.99262, 178.63798]])
>>> a[:,3:6] # tvecs
array([[ 60.21038,  28.68765, 322.74813],
       [310.49614, -49.86509, 606.96119],
       [-54.25211,  -4.84746, 504.64635],
       [ 89.94035,  38.55814, 326.01584],
       [ 68.89998,  29.21896, 523.24882],
       [233.29718,  58.15206, 431.18345]])
1 Like

thank you for the detailed explanation. Things are much more clear now. I’m glad this site isn’t like the stackoverflow.

(off topic)

stackoverflow’s “quality” is partly a function of it focusing the whole world of ignorance into one stream of questions, which presents a “fire hose” to any altruists willing to waste some free time answering questions there. and partly because it’s hit-and-run and newbies aren’t sanctioned for being lazy or ignoring the reward mechanisms that are supposed to keep the altruists from rioting.

this place has no reward mechanisms in the first place, and it’s niche, so the fire hose is just a garden hose.