Converting from image coordinates to camera coordinates using undistortPoints

Given the following figure taken from the Calib3Dd docs:

I want to convert pixel coordinates that I measured in the image (u, v) into coordinates relative to the principal point (x, y). If I understand the figure correctly, OpenCV does not flip the y-axis, as some other SFM packages do, so, the x- and y-axis of the image plane point in the same direction as the u- and the v-axis. As a result, converting between (u, v) and (x, y) coordinates should merely be a matter of subtracting (c_x, c_y) (ignoring all distortions).

For concreteness, here’s a numerical example:

c_x = 5.0
c_y = 5.0
(u, v) = (8.22, 1.3)

Assuming my understanding is correct, I would expect
(x, y) = (u, v) - (c_x, c_y) = (8.22, 1.3) - (5, 5) = (3.22, -3.7)

And according to my understanding of the docs of Calib3d, undistortPoints or undistortImagePointswould be the go-to functions for this task. However, when I do the following:

fun undistortImagePointsGeogebraExample() = manageResources {
    val intrinsics = CameraIntrinsics(
            focalLength = Point(3.0, 3.0),
            opticalCenter = Point(5.0, 5.0)
        DistortionCoefficients.default // All zeros or null

    val observedPoint = Point(8.22, 1.3)
    val expectedUndistortedPoint = Point(3.22, -3.7)

    val dst = MatOfPoint2f().closeLater()

    val actualUndistortedPoint = dst.toArray()

    assertEquals(expectedUndistortedPoint, actualUndistortedPoint)

actualUndistortedPoint evaluates to Point(x=1.0733333826065063, y=-1.2333333492279053)

If I replace undistortPoints with undistortImagePoints in the above test, I get Point(x=8.220000267028809, y=1.2999999523162842) instead.

Now, I already heard multiple times that undistortPoints returns normalized coordinates and one should use undistortImagePoints instead. However, since undistortImagePoints basically returned the same coordinates that I entered, I am wondering whether undistortImagePoints merely removes image distortion (which is non-existant in this case as I supplied all-zero distortion coefficients) and the coordinates still have their origin in the top-left corner.

Also, given that undistortPoints returned a negative y-coordinate, I am wondering whether the normalized-coordinate thing is really the only difference between the two functions.

My question therefore is whether my reasoning is correct and if so, my usage of undistortImagePoints is correct.
Thanks and cheers :slight_smile:

Note that the euclidean norm of (x=1.0733333826065063, y=-1.2333333492279053) is 1.6349788073657858 and not 1, so, calling those coordinates “normalized” is confusing to me.

Also, (1.0733333826065063, - 1.2333333492279053) * 3 is approximately equal to my expected result, indicating that undistortPoints may be the better way to go, in which case an additional question is how this scale factor can be determined reliably ( or how I can get rid of it).

I ended up finding the solution in plain sight: Yes, my understanding is correct and undistortPoints is the way to go. The term “normalized” may be confusing here, as it means in this case that the coordinates are scaled by the focal length. So, given that I specified an arbitrary focal length of f_x = f_y = 3.0 in my code, my numerical example actually looks like this:

(x, y) = (u, v) - (c_x, c_y) = (8.22, 1.3) - (5, 5) = (3.22, -3.7) = (1.0733333826065063, -1.2333333492279053) * (f_x, f_y)