Better detecting feature and/or improving matches between images

I’ve been working through some examples with OpenCV and feature matching and have hit a point where I’m frankly unsure of how to improve results.

My goal, itself, is pretty simple - given some game screenshots, I’d like to be able to extract meaningful information. There will be absolutely no rotation in images, though there may be some scale variance if I try to scan for information using different resolution images.

This project is done in C# through EmguCV, but as my questions are related to OpenCV and/or tuning the various detectors/extractors/matchers or picking other options, it should translate well enough.

These are the results I’ve been able to achieve so far:
imgur gallery - Flann
imgur gallery - Flann (blue matches)
imgur gallery - BFMatcher

I am working from these base images:
imgur gallery

This set consists of five images run against one model, each run for AGAST/FREAK, ORB, and STAR/BRIEF. Matches were detected via KnnMatch. In my initial testing, these combinations seemed to yield the best count of keypoints in the correct ROI in the main image based on those detected in the model image. In general, STAR/BRIEF seems to be the most consistent, though the calculated homography on all AGAST/FREAK and ORB examples leads me to believe something is very wrong with matching.

1: Given the goal - extraction of information from images which should have no need for rotation consideration - are there better options?

2: Given these should always be same-plane, is there a way to constrain results to same-plane? To clarify - in some examples, homography seems to differ extremely from the same-plane expected result.

The parameters I’m currently using are derived from a mix of examples and trial/error - I’ve found very little documentation regarding the effects these parameters have and have mostly inferred effects from EmguCV’s documentation or from examples.

I am using these parameters:

DescriptorMatcher matcher = new FlannBasedMatcher(indexParams: new Emgu.CV.Flann.LshIndexParams(20, 10, 2), search: new Emgu.CV.Flann.SearchParams(checks: 50));


Feature2D orbDetector = new AgastFeatureDetector(threshold: 15, nonmaxSuppression: true, type: AgastFeatureDetector.Type.AGAST_5_8);
Feature2D freakExtractor = new Freak();


Feature2D orbDetector = new ORB(numberOfFeatures: 1500, scaleFactor: 1.6f, nLevels: 12, fastThreshold: 15, edgeThreshold: 0);


            Feature2D starDetector = new StarDetector(maxSize: 22, responseThreshold: 20, lineThresholdProjected: 15, lineThresholdBinarized: 8);
            Feature2D briefExtractor = new BriefDescriptorExtractor();

The logic for feature matching is fairly straightforward and is just a cleaned-up adaptation of an EmguCV example:

        /// <summary>
        /// Match the given images using the given detector, extractor, and matcher, calculating and returning homography.
        /// The given detector is used for detecting keypoints.
        /// The given extractor is used for extracting descriptors.
        /// The given matcher is used for computing matches.
        /// Detection and matching will be done in two separate stages.
        /// The Mat and Vector... properties of this result are unmanaged - it is assumed the caller will dispose results.
        /// </summary>
        /// <param name="featureDetector"></param>
        /// <param name="featureExtractor"></param>
        /// <param name="matcher"></param>
        /// <param name="observedImage"></param>
        /// <param name="modelImage"></param>
        /// <returns></returns>
        public MatchFeaturesResult MatchFeatures(Feature2D featureDetector, Feature2D featureExtractor, DescriptorMatcher matcher, Mat observedImage, Mat modelImage)
            using (UMat observedImageUmat = observedImage.GetUMat(AccessType.Read))
            using (UMat modelImageUmat = modelImage.GetUMat(AccessType.Read))
                // Detect keypoints
                var observedImageKeypoints = featureDetector.Detect(observedImageUmat);
                var modelImageKeypoints = featureDetector.Detect(modelImageUmat);

                var observedDescriptors = new Mat();
                var modelDescriptors = new Mat();

                var observedKeypointVector = new VectorOfKeyPoint(observedImageKeypoints);
                var modelKeypointVector = new VectorOfKeyPoint(modelImageKeypoints);

                // Compute descriptors
                featureExtractor.Compute(observedImageUmat, observedKeypointVector, observedDescriptors);
                featureExtractor.Compute(modelImageUmat, modelKeypointVector, modelDescriptors);

                // Match descriptors

                var matches = new VectorOfVectorOfDMatch();
                matcher.KnnMatch(observedDescriptors, matches, 2);

                // Filter matches based on ratio
                //matches = LowesFilter(matches);

                var mask = new Mat(matches.Size, 1, DepthType.Cv8U, 1);
                mask.SetTo(new MCvScalar(255));

                Features2DToolbox.VoteForUniqueness(matches, 0.8, mask);
                Mat homography = null;
                var nonZeroCount = CvInvoke.CountNonZero(mask);
                if (nonZeroCount >= 4)
                    nonZeroCount = Features2DToolbox.VoteForSizeAndOrientation(modelKeypointVector, observedKeypointVector, matches, mask, 1.5, 20);
                    if (nonZeroCount >= 4)
                        homography = Features2DToolbox.GetHomographyMatrixFromMatchedFeatures(modelKeypointVector, observedKeypointVector, matches, mask, 2);

                var result = new MatchFeaturesResult(observedKeypointVector, observedDescriptors, modelKeypointVector, modelDescriptors, matches, mask, homography);
                return result;

There is a duplicate which condenses DetectAndExtract for ORB.

I have tried applying a Lowe’s filter but it negatively impacted results - it’s entirely possible I misimplemented it.

3: Is there anything obvious I could do with the above code especially regarding my usage of OpenCV (via EmguCV, I know) components to improve results?

besides findHomography there are functions that estimate an affine transform and even some more restricted variant (rigid?)

other approaches: HOG classifier, has detectMultiscale


Thank you for the suggestions.

I’ve reviewed documentation regarding OpenCV’s affine transformation and it seems to be somewhat the opposite of what I’m after. If I’m understanding this correctly, it provides facilities for applying rotation, translation, and scaling to an image, whereas I’m more-or-less trying to find keypoint matches specifically disallowing rotation, translation, and scaling. Am I mistaken?

Regarding rigid transforms, I’ve been able to find little useful. Official documentation exists and seems to imply that this, too, is somewhat the opposite of what I’m after - given two images, it would detect common features among them and use that to estimate the transform applied to first image to result in second, whereas I’m trying to just detect feature matches with the assumption no transform is present.

Am I just missing something obvious here?

Regarding HOG classification - I’ve found some discussion that seems to indicate it (and Haar / LBP) could work for what I’m after, though model training seems to be a bit overkill, no?

Yes, that stackoverflow post is my own post - I’m hoping that, between the two communities, someone can steer me in the right direction.

homography has more degrees of freedom than a rigid or even affine transformation. you used that to begin with.

fewer degrees of freedom, that goes in the right direction.

if you’re sure there is not even scaling, just translation, you can just use matchTemplate

Sure - so the benefit is just reduction of degrees of freedom? I’ll give affine/rigid a try and see what happens. Thanks!

My first attempt was simple template matching - the results there were… entirely useless. As far as I can tell, using the example screenshots as a reference, the differences in backgrounds made a simple cropped template useless, and template matching seems to really not work with transparency.

I would have recommended posting clean data, not drawn over with drawMatches

Done - Clean images.

I’ll edit the original question to include them.

I’m trying to determine the best way to apply an affine transform here based on that same linked documentation. Forgive me, but I’m having a hard time seeing the application.

That documentation indicates we’d need a 2x3 matrix of points describing triangles representing points of relationship between the images. This implies a known relationship between the images before going into calculation.

The whole point of this exercise is to determine the relationship (if any) between the two images - or, to restate, the presence (and location) of components of the second (model) image in the first (observed) image.

If I’m following along, the matches calculated by Flann / KnnMatch would be this relationship between the images. But, the whole reason the homography approach is yielding unsatisfactory results is that the calculated matches used for determining said relationship is insufficient to establish solid relationship - resulting in the unsatisfactory results I’m experiencing.

I think a fresh set of example images with drawn relationships - but better coloring - might illustrate: imgur gallery

Some illustrations of “false matches” skewing the results:
02 ORB - one of the detected common features is an overlap between the “S” text on the right and the bottom of our model.
03 AGAST/FREAK - the detected matches have nearly no true hit rate with the model.
and on, and on.

My understanding is that If this were a strong relationship to start with, I could use the points of my best-candidate matches to describe a triangle on either image, calculate rigid/affine 2d transform, and have a better bounding area / relationship between the images described.

However, that relationship has to be valid beforehand - which is what led me here.

I can’t seem to find a feature detection algorithm which works comparably well against the overall observed image and the various options for model image. STAR seems to be the best so far but even it has inconsistent detection of model features in the observed images.

I’m starting to think I’m going to have to go the Haar model training route.

just go with AKAZE or SIFT. I don’t know why you bother with any of those other algorithms. I’ve never heard of half of them (might be better, might not be), and the other half was only invented because SIFT was patented at the time (it’s been free to use for at least a year now).

The whole reason I went this route is due to inconsistent matches with AKAZE in conjunction with its high execution time.

I am in the middle of re-tooling my feature detection test harness to better handle comparing performance and inlier ratios across the various algorithms; I’ll be sure to include AKAZE and SIFT in these.

For the record, these three algorithms (AGAST, ORB, STAR) had the best initial raw performance and had decent results against my initial small sample set of images. My goal is to be able to extract information from these in a semi-realtime fashion - I’m concerned AKAZE will be unsuitable. Though, who knows, maybe it’s tuneable?

Edit: I don’t believe SIFT was available in EmguCV… I’ll verify this evening.

AKAZE did the trick.

I ended up starting from scratch for the sake of ruling out silly things done while learning, ran through everything again, and AKAZE did have some runs which performed outstandingly well.

More to the point, AKAZE had quality matches on all 5 test images.

In case anyone needs an easier means of tuning algorithm choice in the future, I built a GitHub project to handle crunching various parameters against an input, collecting metrics on the outputs, and producing a usable report with reference images: GitHub - jeremy-sylvis/OpenCv.FeatureDetection: A utlility for image feature detection based on OpenCV.

It only has AKAZE in it for now but I’ll add the rest of what I was using over the next few days.

1 Like

The matchTemplate() function does support a mask parameter. If you’re only searching for that circular symbol, could you pass in a mask of it?

I had tried using a black-and-white mask based on pubic examples with no success.

I would be happy to try again if you happen to know of better reference material I could learn from.

Alas, I’m a CV novice. Just about to try using matchTemplate() for the first time myself, which is why I happen to know about that mask parameter. :relaxed: Good luck!

No worries.

I’ll make a note to rebuild my matchTemplate() tests for review and comparison.