Looking for adaptive kernel size detector for dilate

zoltansz · October 22, 2021, 1:07pm

Dear all, I am developing an adaptive image preprocessing pipeline for Tesseract OCR. It is working quite fine for very different documents and lighting conditions.

However, I found that for some images the optimal kernel size used for cv2.morphologyEx() is 3x3, for others 5x5 or even 7x7.

Is any of you aware of an algorithm that determines the optimal kernel size (and iteration value) for cv2.morphologyEx() (or erode()+dilate()) in advance? Or afterwards, but strictly before OCR as it is a very expensive operation.

I think of stg similar to cv2.adaptiveThreshold() or autoCanny() that finds the optimal upper and lower parameters using the np.median() of the image.

Thanks,
Z

crackwitz · October 22, 2021, 5:02pm

what do you consider “optimal”?

zoltansz · October 22, 2021, 6:54pm

Under “optimal” set of parameters (kernel size etc.) I mean that for an image, as an OCR input, the OCR gives the least number of errors in the OCR text output.

I use Tesseract and it has a dozen quality criteria regarding the image input. Contrast, vertical resolution, margins, low noise etc. All of these are fine in my case.

However, depending on the input, the lines of the letters are sometimes just a bit too thin (lines not always continuous) , sometimes too thick (neighbouring letters touch each other).

Erode/Dilate can solve these, but I am looking for a solution that can set these set of parameters specific to the image.

dodo · October 24, 2021, 9:23pm

if the speed is not the issue you can try 2n+1 predictions with different kernel sizes and accept the result that predicted the most. this is a form of test time augmentation.

also if I remember correctly tesseract would work best with black text on white background, if that’s not the case for you, you can make the image binary using otsu threshold and calculate total black vs white pixels, if the number of black pixels is more, negate image using binary_not.

zoltansz · October 25, 2021, 7:21am

Due to the fact Tesseract takes some seconds to finish, almost every other OpenCV function can be regarded as ‘fast’.

Yes, I can run a couple of dilates using different kernel sizes. But then, how can I find out if the lines of the letters have ‘proper’ thickness, that is not too thin (continuous) and not too thick (do not touch each other)? The problem is, all letters are small so I searching for too many small islands would not work. Erode would erase valuable parts of the fonts.

Topic		Replies	Views
Processing for text extraction Python ocr , text , tesseract	2	672	July 19, 2022
Is there a way to increase the quality of feature matching by using (erosion, threshold, dilation/erosion etc..) prior to feature detection? opencvjs , features2d	12	1349	February 9, 2021
Adaptive threshold parameter estimation Python imgproc	6	544	July 21, 2021
Algorithm used for Morphological Operations	0	277	November 19, 2022
Best way to detect containers and OCR in imperfect real life images? Python	7	958	September 27, 2022

Looking for adaptive kernel size detector for dilate

Related Topics