Recognizing a grid of letters from a phone display

Ralf_71 · April 16, 2025, 8:33am

Hello everybody!

I want to build myself a program to solve a HASHTAG riddle. It is like the Wordle game, or mastermind, wrapped into one - and then you solve four of them in a grid together.

This is what it looks like:

Solving the puzzle is done already. If I enter the 16 letters and colors my code solves the puzzle in a few seconds, referencing a dictionary to find legal words. The riddle there is in German language btw., should you try to solve it yourself!

Of course I do not want to enter that information manually. I want to hold my phone in front of my laptop camera and see the magic. This is where OpenCV enters the stage.

The picture above was taken by said laptop camera with these lines of code (this is C#)

private VideoCapture _capture;
_capture = new VideoCapture(0);
_capture.Set(VideoCaptureProperties.FrameWidth, 2048);
_capture.Set(VideoCaptureProperties.FrameHeight, 1536);
and
public Mat frame = new Mat();
_capture.Read(frame);

I then process the Mat object using

Cv2.CvtColor(input, gray, ColorConversionCodes.BGR2GRAY);
Cv2.GaussianBlur(gray, gray, new Size(3, 3), 0);

I build thresholds using

Cv2.AdaptiveThreshold(gray, thresh, 255, AdaptiveThresholdTypes.GaussianC, ThresholdTypes.BinaryInv, 11, 2);

I can provide pictures of the gray blurred image and the thresholded one, yet as a new user I am allowed only one media per post. I might reply to myself or to your answers with those other pictures.

Finally, I do find contours and ApproxPolyDP and filter for 4-corner objects, 500 pixel or more in size, and almost square in aspect ratio.

        var contours = new Point[][] { };
        HierarchyIndex[] hierarchy;
        Cv2.FindContours(thresh, out contours, out hierarchy, RetrievalModes.External, ContourApproximationModes.ApproxSimple);

        var candidates = new List<Rect>();
        foreach (var contour in contours) {
            var approx = Cv2.ApproxPolyDP(contour, 0.02 * Cv2.ArcLength(contour, true), true);
            if (approx.Length == 4 && Cv2.ContourArea(approx) > 500) {
                var rect = Cv2.BoundingRect(approx);
                float ratio = (float)rect.Width / rect.Height;
                if (ratio > 0.75 && ratio < 1.25)
                    candidates.Add(rect);
            }
        }

This works somewhat. I did get up to 12 boxes of the 16 recognized. With the example picture it caught only one of them, see the coloured picture with the green rectangle.

Don’t get me wrong - I am amazed what OpenCV delivers here with me giving 10 commands. I am stunned and shocked how powerful this is.

Now I need to point out, the code above is 99.9% ChatGPT created. I do have a good understanding what it is doing, yet I am confident there are better options available in OpenCV to create better results.

Can you help me find those? What I thought of so far:

Increase resolution of webcam picture (if possible, need to check hardware)
Lightning correction and colour signal amplifing of webcam picture
Fiddling with the parameters of the gaussian blur or the threshold detection
Rotating or perspective transformation to get a flatter smartphone screen reading
Different methods of image processing between the steps I have so far
Do the complete image processing workflow not once on one picture, but continuously. Merge together the results, like top 8 in first try, left 5 in second, nothing in third, nothing in fourth, right 8 boxes on fifth try and so on

The threshold picture looks so promising to me, it has clearly worked out the 16 squares and the gaps between them as thresholds. This is so close. I am confident, this must be possible, right?

Thanks for any and all input - I love playing around with this toolkit!

Ralf_71 · April 16, 2025, 8:37am

In answer to myself, this is what I have after AdaptiveThreshold and before FindContours:

crackwitz · April 16, 2025, 9:41am

yeah forget all of that.

throw OCR at it directly, or at least a text detection model. opencv comes with a text detection model. they call it “EAST” for some reason.

it should give you bounding boxes, not necessarily recognitions yet. it should give you boxes for individual letters, not trying to group them into words.

then you only have to figure out how the detection boxes match to the expected grid. grid recovery from a set of detections takes a bit of programming. you’d need to determine the grid spacing, then the major axes. I have some ideas for that, didn’t try them out yet. grid recovery has applications in other situations, e.g. camera calibration. one of these days I might be moved to actually get that done as a proof of concept, maybe even contribute it to opencv (I hate C++).

Topic		Replies	Views
How to detect the location of a grid of letters in an image with openCV and Python? Python imgproc , programming	1	2498	August 11, 2022
Card recognition, how to get started?	2	3879	July 17, 2022
Detect numbers in grid Python ocr , imgproc , text , programming	9	2650	November 19, 2022
Has anyone here used opencv to identify blocks of text in photographs? (it is not about OCR)	0	298	January 17, 2023
Helps in detecting text characters on metal plates Android/Java text	5	432	July 20, 2023

Recognizing a grid of letters from a phone display

Related topics