How to remove those pixels around the dates and watermark

Hi,

I am new to opencv and have little or no experience in image processing. So I read and learn from those posted their solution here.

I am trying to remove some “noise” surrounding the dates using Python. Can you please share with me what will the most effective way to do it ? I tried using contrast and brightness. End up, the surrounding got lighten up and some text disappeared.

The image can be found in the link below.

This is a medical certificate. I would like to use OCR to extract the dates. Also the big round watermark surrounding the text is also causing the OCR to perform badly.

This is the code I have tried.

import cv2
import numpy as np

img = cv2.imread("mc.jpeg")
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

alpha = 3.5
beta = -2

new = alpha * img + beta
new = np.clip(new, 0, 255).astype(np.uint8)

cv2.imwrite("cleaned.png", new)

Is there an adaptive way to adjust alpha and beta values for different images?

I have to use different alpha and beta values since not all images have the same contrast and/or brightness.

consider thresholding. one well known automatic thresholding algorithm is “Otsu”. it might or might not catch just the letters and not the bothersome background.

also give “MSER” a try. that might catch the letters regardless of absolute levels.

Hi,

Tried Otsu based on this article:

You probably guessed it, it didn’t catch the letter very well.

Can you also elaborate further regarding “MSER” ? Any example on “MSER” will be great for me to follow up.

Thanks.

Hi there,

I tried “MSER”. It didn’t capture all the text unfortunately. Thanks.