Many a times image sent to OCR containing dotted lines are misinterpreted by the engine as texts. How can we identify the dotted lines specifically in the text and replace them either with blank spaces or solid lines.
berak
June 28, 2021, 6:21am
2
please show, what you tried, so far (code), example image, thank you.
Hey Thank You : let me share the link of the exact problem
opened 05:15PM - 06 May 20 UTC
### Environment
* **Tesseract Version**: 4.1.1
* **Platform**: MacOS Catalina… 10.15.4
### Current Behavior:
Few issues with `tesseract -l eng --psm 1` on this image:

Some dots on lines ignored:
`Cheese on Pasta...`
Some dots on lines have strange letters, and are incorrectly capatilised:
`SAUCE ON PASTA... cccceces cece cesses ces seeses cesses c`
Numbers on both sides become strange text:
`OW `AMNAURWNP`, `©M~MURDUNBWNHE`
Here's the full PSM 1 text:
```
MY PASTA RESTAURANT
DISHES
Cheese on Pasta...
Cheesy Spaghetti...
OW AMNAURWNP
SPAghe ttn... .ccccececcseseeceeseeceeceesoesee ses sessee ses eesseseesaesaesees eeeesees es
SAUCE ON PASTA... cccceces cece cesses ces seeses cesses ces seesescaeseeeeseuueeces anes
Mega value cheese o on some Spicy Sauce... esate eeeees
Fresh and Tasty handmade assortments (FATS) cscscsocne
ANTIP ASTI... eee cee cee cee coe coeese ses couse ses cesses aes cesses caeses ses caeeaescaeeees
FreSh SIAW......ccescsseecee cesses coecas cesses see see cusses sue ses aecaecas ces case eenaee sees
NOOC1@S... 20. .s. cesses co cesses see cuecoe ces cusses cou cesses ace sue seseas cuecaeens ease senses es
©M~MURDUNBWNHE
```
`tesseract -l eng --psm 12` is better:
- no random capitalisation, apart from numbers on both sides turning into capitalised words.
- 3 of 9 lines have a single ellipsis, the rest have no dots.
Here's the full PSM 12 text:
```
MY PASTA RESTAURANT
DISHES
Spaghetti
Sauce on Pasta
Cheese on Pasta...
Cheesy Spaghetti...
Mega value cheese o on some Spicy Sauce...
Fresh and Tasty handmade assortments (FATS)
Antipasti
Fresh slaw
WON DUN BWHN PR
Noodles
OMAN DY BWN PR
```
### Expected Behavior:
Expect dots to be OCR'd as dots, text output to look like text on input image.
This link contains the exact issue.
I have tried the following codes:
How to convert dashed lines to solid? - OpenCV Q&A Forum