Difficulty to use cv2.adaptiveThreshold with the reflection problematic

GOAL: I need a cleaned image to use pytesseract and get the text from it.
I pass this image into gray.
I use cv2.adaptiveThreshold to deal with the reflection problematic. But it doesn’t work well.
My text became less readable and pytesseract can’t read it. I don’t know how to upgrade my image.

import cv2

path = "path/to/image.jpg"


rgb_img = cv2.imread(path)
gray_img = cv2.imread(path, 0)
tresholded_img = cv2.adaptiveThreshold(gray_img, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY_INV, 19, 1)
cv2.imshow('rgb', rgb_img)
cv2.imshow('gray', gray_img)
cv2.imshow('tresholded', tresholded_img)
cv2.waitKey(0)

use a larger block size for adaptive threshold.

the inverted text will always be tricky.

modern OCR should not require manual thresholding. it should work on grayscale and even color data directly. tesseract is not modern OCR.

maybe look at EasyOCR (GitHub - JaidedAI/EasyOCR: Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.)

related:

Thank you for the advice :slight_smile: !

import easyocr

path = "path/to/image.jpg"

reader = easyocr.Reader(['fr', 'en'])
result = reader.readtext(path)
print([l[1] for l in result])

#expected -> ['18/05', '08:47', 'Zone', '12', 'Adresse', '04011', '02/09', '19:04', 'Zone', '1', 'Adresse', '02106', 'Retour', 'Impr']
#rgb -> ['EI', '0s', 'Fer47', 'One', '1/', 'Hdresse', '04g1', 'ITH', 'A2', 'IEFA', 'Zone', 'Aoresse', '02106', 'Relqu', 'mPr']
#gray -> ['705', 'U8F 47', 'Zone', '1z', 'Adresse', '04011', '~17E9', 'A', 'SFu4', 'Zone', 'Adresse', 'Gz104', 'Relour', '(mPr']
#tresholded -> ['6497055 S68fTz', 'Zorie', 'I2', '04011', '@xr8sse', 'Js6E5', 'Zone', 'edresse', '62106', 'ZRetsur', 'mPr']

It’s more easy to use and have better results ! But I still have quality difficulty. I want to try to train a custom as it’s proposed on the Github but I don’t know how many images I need to get better perfomances. 100 ? 1000 ? 10 000 ? 100 000 ? (for the moment I have 100 so I don’t know if it’s useful to try).

you can use craft to localize text first, I used it in production and works pretty well.

this gives you the opportunity to standardize images before giving to tesseract or any neural ocr.

what I mean by standardization is resizing the text to a constant size, making it always black text on white bg, giving the text some extra margin to make ocr’s job easier etc.

unless you always have the same specific types of images, don’t train it yourself as it would probably be inferior. on the other hand, task specific models usually outperform generic models easily (on that specific task obviously).

craft is really good, I use it now !
It helped me to isolate text and make treatment for them 1 by one.
I have good results on some parts but still big issues on others (and I don’t know why).

Results:
[‘One’]
[]
[‘Zone’]
[‘Zohe’]

I’m gonna try to add white marge just in case. For the text size, it’s the same. The white as background is somethimes better than black as background ^^.
Have you some others ideas/tips ? :slight_smile:
Thank you for the help guys !

Same problems for this one:

picture is related: