How to detect the location of a grid of letters in an image with openCV and Python?

I’m writing a program that takes an image that has a grid of 4 by 4 letters somewhere in that image.

E.g.

I want to read these letters into my program and for that I’m using pytesseract for the OCR.

Before feeding the image to pytesseract I do some preprocessing with openCV to increase the odds of pytesseract working correctly.

This is the code I use for this:

import cv2
img = cv2.imread(‘my_image.png’)
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
img_pre_processed = cv2.threshold(img_gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]

Here is a sample output of img_pre_processed:
(https://i.stack.imgur.com/xuDXL.jpg)

Since the letters in the grid are spaced apart pytesseract has a difficult time to read them when I give the entire image as input. So it would be helpful if I knew the coordinates of every letter, then I could edit the image in such a way that pytesseract can always recognise them.

I started to try and solve this problem on my own and the solution I’m coming up with might work but it’s getting rather complicated. So I’m wondering if there is a better way to do it.

At the moment I’m using the cv2.findContours() function to get all the contours of the objects in the image. For every contour I calculate the center coordinates and the area of the box you would be able to draw around it. I then sort these by area to get the largest contours. Now here it starts to get more and more complicated. I can’t just say take the biggest 16 contours, because there might be unwanted objects in the picture that have a bigger area than the 16 letters that I want. Also some letters like O, P, Q,… have 2 contours and their inner contour might even be bigger than another letters outer contour like the letter I for example.

E.g. This is an image with the 18 biggest contours marked with a green box.
(https://i.stack.imgur.com/QueGn.jpg)

So to continue with my way of attacking the problem I would have to write an algorithm that finds the contours that are most likely part of the grid while ignoring the contours that are unwanted and also the inner contours of letters that have 2 contours.

While this is possible, I’m wondering if there is be a better way of doing this.

Somebody told me that if you can filter the image in such a way that everything gets more blurry so that all the letters become blobs. That it might be possible to do a pattern detection with 4x4 grid of blobs. But I don’t know how to do that or if that’s possible.

So if somebody knows a better way to tackle this problem or if you know how to execute the plan of attack I mentioned earlier that would be most helpfull.

Thanks in advance!

crosspost: How to detect the location of a grid of letters in an image with openCV and Python? - Stack Overflow