Hello,
I am trying to extract and validate numbers from an LCD display using OpenCV and PyTesseract. However, for my particular use, I need as close to 100% accuracy & consistency as possible. After attempting multiple different approaches over the past few days, the best I’ve been able to achieve is around 98% accuracy (of around 1000 extractions). For the last 2% of the time, the extraction either fails, or an incorrect character (for instance confusing “8” and “3” with each other) is returned.
I will illustrate my entire process below, but my question is simple:
Is it possible to extract values from an LCD display with ~99.5% or higher accuracy? If so, what is the best approach to doing so?
My entire process thus far:
- In this picture, the top left, “Frame,” is the image captured by the VideoCapture object (USB webcam).
- The first step is locating the LCD display’s screen. Through a locateDisplayFunction() which happens prior to the code shown below, the “Frame 0” image is produced. A green outline is drawn on “Frame” in order to visualize what part was cut out.
- From here, a specified ROI is chosen. This ROI is the right side of one of the 8 data cells seen in the first two frames. The ROI is decided based on a hardcoded ID set explicitly in each of my test cases.
- After cropping to our specific ROI, a series of operations are performed. The image is converted to grayscale, all pixels except near-white RGBA values are converted to black (to make the image a true black/white image instead of grayscale), the image is enlarged roughly 3x, and a slight blur is applied to smooth the edges since the image has been upscaled so heavily. All of these produce what is seen in “Frame 1” at the top right.
- I’ve seen in a few different places that PyTesseract prefers black text on a white background. Thus, “Frame 2” is an inverted copy of “Frame 1.”
- Finally, in order to search for text, I look at the contours. However I want the entire number to return as a whole string, rather than each character being extracted individually. To do this, I heavily dilate the text in order to combine adjacent text contours. This can be seen in “Frame 3” on the bottom right. The rectangle seen in “Frame 1” is drawn around the contour found in “Frame 3.” The PyTesseract text extraction looks for text inside of that rectangle.
A generic list of solutions I’ve attempted to improve the output quality & consistency:
- Limited ambient light by placing entire setup (display and camera) inside of a closed, dark container.
- Tried other --psm modes (0-13). I have tried all of them extensively. I found that --psm 6, 7, and 8 work the best.
- Scaled ROI up/down according to recommendations from other developers
- Tested with various dpi values using -config option in PyTesseract’s “image_to_string()” function.
- Captured multiple images and layered them with a bitwise or operation to take the absolute sum of the images.
- Extracted a Pandas DataFrame using PyTesseract’s “image_to_data()” from multiple images taken rapidly and comparing the confidence intervals of each. Then throwing out any below X% confidence where X is a set threshold (60-80% maybe?).
I have attached my function which performs the processing as it currently stands below. It is important to note that this is after attempting dozens of different solutions and applying a whole host of blurs, erosions & dilations, and thresholds; consequently, the code is fairly messy and extensive.
There is also a determineROICoords() function which I have not added the code from. This function determines which region of the image is focused on. The data is formatted in a 4x2 table. This function simply selects which cell to look at.
# function that takes in a frame (img) and extracts text from it
def processImage(img, dpid):
# access tesseract's installed location on Pi
pytesseract.pytesseract.tesseract_cmd = r"/usr/bin/tesseract"
# creates an empty list to store all extracted text
extractedText = []
# creates a list to store all frames returned by the function
allFrames = [img] # frame 0
# convert numpy array image to PIL image in order to crop + scale it easier
pil_img = Image.fromarray(img)
width, height = pil_img.size
# filter all colors except white
pil_img = pil_img.convert("RGBA")
pixdata = pil_img.load()
# converts all pixels that aren't white (< RGBA(250, 250, 250, 255) to black)
for y in range(pil_img.size[1]):
for x in range(pil_img.size[0]):
if pixdata[x, y] <= (250, 250, 250, 255):
pixdata[x, y] = (0, 0, 0, 255)
# end of inner if statement
# end of inner for loop
# end of outer for loop
# update width & height. The display layout 8 is similar to a 4x2 grid with margins above and below, so we make a 6x2 grid (rowxcol).
width_scale = (1/2)
height_scale = (1/6)
width = int(width * width_scale)
height = int(height * height_scale)
# calls function that crops the image depending on what zone (first parameter) we're looking for. This heavily depends on camera position.
crop_coords = determineROICoords(dpid, width, height)
pil_cropped = pil_img.crop(crop_coords)
# resize so that it is larger for display/development purposes. Also helps maintain similar aspect ratio when extracting text.
resize_w_scale = 1
resize_h_scale = 3
scaled_coords = (resize_w_scale * width, resize_h_scale * height)
pil_cropped = pil_cropped.resize(scaled_coords)
# convert cropped, resized PIL image back into grayscale numpy array (uint8)
img_cropped = numpy.array(pil_cropped)
img_cropped = cv2.cvtColor(img_cropped, cv2.COLOR_BGR2GRAY)
img_cropped = numpy.uint8(img_cropped)
# eliminate any extra white/gray "dots" (noise) scattered throughout
otsu = cv2.threshold(img_cropped, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
kernel = numpy.ones((2, 1), dtype=numpy.uint8)
erosion = cv2.erode(otsu, kernel, iterations = 1)
kernel = numpy.ones((1, 2), dtype=numpy.uint8)
erosion = erosion + cv2.erode(otsu, kernel, iterations = 1)
kernel = numpy.ones((3, 3), dtype=numpy.uint8)
dilated = cv2.dilate(erosion, kernel, iterations = 1)
mask = dilated / 255
img_cropped = otsu * mask
# converts back to numpy array (uint8) again and applies a small blur to blend edges
img_cropped = numpy.uint8(img_cropped)
img_cropped = cv2.blur(img_cropped, (2, 2))
# adds an initial simple thresh that makes all light gray values white (eliminates some noise)
_, img_cropped_thresh = cv2.threshold(img_cropped, 10, 255, cv2.THRESH_BINARY)
img_cropped = cv2.blur(img_cropped_thresh, (2, 2))
# apply simple threshold to img_cropped_thresh (2nd threshold that is purely black/white split evenly)
_, thresh = cv2.threshold(img_cropped_thresh, 127, 255, cv2.THRESH_BINARY)
gray = img_cropped
# dilate the image to combine adjacent text contours (prevents every single individual contour from creating a new rectangle)
kernel = numpy.ones((2, 2), numpy.uint8)
thresh = cv2.dilate(gray, kernel, iterations = 2)
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5, 5))
dilate = cv2.dilate(thresh, kernel, iterations = 8)
# find the contours, highlight the text areas, and extract our ROIs
cnts = cv2.findContours(dilate, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
if len(cnts) == 2:
cnts = cnts[0]
else:
cnts = cnts[1]
# end of if statement
# inverts frame with text extraction runs against (so it's black text on white background)
_, thresh = cv2.threshold(thresh, 127, 255, cv2.THRESH_BINARY_INV)
# loop through all of the contours
for cnt in cnts:
# get the total area of the current contour
area = cv2.contourArea(cnt)
# ignore contours below a certain size
minContourSize = 4000
if (area > minContourSize):
# get borders of current contour
x, y, w, h = cv2.boundingRect(cnt)
# draw a rectangle around current contour on the inputted img (BGR)
rect = cv2.rectangle(img_cropped_thresh, (x, y), (x + w, y + h), (255, 255, 255), 1)
# targets or sets the ROI so we only extract text from this contour
roi = thresh[y:y + h, x:x + w]
# extract text. -psm 6 works best, 8, 7, & 3 work decently well
#text = pytesseract.image_to_string(roi, lang='eng', config="--psm 6 -c tessedit_char_whitelist=0123456789.-N", timeout = 5.0)
text = pytesseract.image_to_string(roi, lang='eng', config="--psm 7 --dpi 500 -c tessedit_char_whitelist=0123456789.-N", timeout = 5.0) # testing
text = text[:-1] # the [:-1] gets rid of newline character that is added automatically
if (text == None):
processFailedFrame(extractedText, roi) # calls function which processes failed frames
else:
# ensures that resulting text contains alpha-numeric characters
if (any(c.isalpha() for c in text)) or (any(c.isnumeric() for c in text)):
extractedText.append(text)
else:
#print("No alpha-numeric characters were extracted.")
processFailedFrame(extractedText, thresh) # calls function which processes failed frames
# end of innermost if statement
# end of inner if statement
# end of outer if statement
# end of for loop
# add all frames to list
allFrames.append(img_cropped_thresh) # frame 1
allFrames.append(thresh) # frame 2
allFrames.append(dilate) # frame 3
# return all frames and extracted text
return allFrames, extractedText
# end of processImage() function
Lastly, for some reason I am not able to extract 0. My tests have a ~98% success rate as stated above, with exception to 0. When a value of “0” is shown, the setup fails 100% of the time without exception.
Any help that can be provided is greatly appreciated. If more information is required, please let me know and I will do my best to get it to you.
Thanks