I am implementing a part of this paper: Optimizing Through Learned Errors for Accurate Sports Field Registration.
I am training a neural network to, based on an image, estimate the coordinates of a given reference vector after transformation by a homography matrix H. The goal being the ability to generate the homography of an image of real-life broadcast onto a generic template. To do so, I am creating ground truths for each image (basically try and make the most accurate homography of the image I can), and then compute the transformation of the reference vector.
To check and see that this data is accurate, I recreate the homography with the reference vector and it’s transformed vector, and verify if the resulting matrix is equal to my “ground truth” homography. I thought this would be a quick thing, but actually I can’t seem to find why it does not work. Every value is the same, but the last row. I find myself computing a homography that closely ressembles the ground truth, but with slight differences specially in scaling, which I assume the different values of this last row are at fault for.
# ground truth homography from correspondences between image and template
H, _ = cv2.findHomography(registered_image_points, registered_template_points, cv2.RANSAC)
# computing transformation of a reference vector of points
transformed_vector = self.get_transform(H, self.REFERENCE_POINTS))
H
...
[[ 3.22e+02 1.69e+02 5.31e+02]
[ 1.50e+02 1.13e+03 2.65e+02]
[-2.69e-02 6.20e-01 1.00e+00]]
# generated homography from reference vector and it's transformation
h = cv2.getPerspectiveTransform(self.REFERENCE_POINTS, transformed_vector)
h
...
[[ 3.22e+02 1.69e+02 5.31e+02]
[ 1.50e+02 1.13e+03 2.65e+02]
[-2.98e-07 4.47e-07 1.00e+00]]
The reference vector is supposed to be from a a normalized image coordinate system (x,y) = [-0.5;0.5], the projection space too.
This discrepancy makes it so that even with an accurate neural network that can give a perfect regression of the transformed vector, it wouldn’t matter because I can’t recover the ground truth homography and I don’t know why. Is this a “feature” of something intrinsic to the method used and thus inescapable ? Should just ditch the method used by the paper and train my model to estimate the values of the homography matrix directly ? If they did not have this problem then that shouldn’t be necessary and I have overlooked something important.