How to skip differences in the image and recognize the identical part as the same

Snow · July 5, 2021, 2:04am

I got two images of article contents.
For example,
First one: I wandered lonely as a cloud
Second one: I walked lonely as a cloud
The expected result I’d like to have is highlight the word “walked”.
But now cos of amount difference of “wandered” and “walked”, the words after them are also recognized as differences. Could you help with it please. Many thanks to it.

    refFilename = "images/html2library.png"
    print("Reading reference image : ", refFilename)
    imReference = cv2.imread(refFilename, cv2.IMREAD_COLOR)
    hh, ww = imReference.shape[:2]

    # Read image to be aligned
    imFilename = "images/html2x.png"
    print("Reading image to align : ", imFilename)
    im = cv2.imread(imFilename, cv2.IMREAD_COLOR)

    # Aligned image will be stored in imReg.
    # The estimated homography will be stored in h.
    imReg, h = alignImages(im, imReference)

    # Print estimated homography
    refSat = cv2.cvtColor(imReference, cv2.COLOR_BGR2HSV)[:, :, 1]
    imSat = cv2.cvtColor(imReg, cv2.COLOR_BGR2HSV)[:, :, 1]


    # Otsu threshold
    refThresh = cv2.threshold(refSat, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
    imThresh = cv2.threshold(imSat, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]

    # apply morphology open and close 
    kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (7, 7))
    refThresh = cv2.morphologyEx(refThresh, cv2.MORPH_OPEN, kernel, iterations=1)
    refThresh = cv2.morphologyEx(refThresh, cv2.MORPH_CLOSE, kernel, iterations=1).astype(np.float64)
    imThresh = cv2.morphologyEx(imThresh, cv2.MORPH_OPEN, kernel, iterations=1).astype(np.float64)
    imThresh = cv2.morphologyEx(imThresh, cv2.MORPH_CLOSE, kernel, iterations=1)

    # get absolute difference between the two thresholded images
    diff = np.abs(cv2.add(imThresh, -refThresh))

    # apply morphology open to remove small regions caused by slight misalignment of the two images
    kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (10, 10))
    diff_cleaned = cv2.morphologyEx(diff, cv2.MORPH_OPEN, kernel, iterations=1).astype(np.uint8)

    # Filter using contour area and draw bounding boxes that do not touch the sides of the image
    cnts = cv2.findContours(diff_cleaned, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    cnts = cnts[0] if len(cnts) == 2 else cnts[1]
    print('difference amount is {}'.format(len(cnts)))
    result = imReference.copy()
    result2 = imReg.copy()
    for c in cnts:
        x, y, w, h = cv2.boundingRect(c)
        if x > 0 and y > 0 and x + w < ww - 1 and y + h < hh - 1:
            cv2.rectangle(result, (x, y), (x + w, y + h), (0, 0, 255), 2)
			
	cv2.imwrite('result/circuit_result.png', result)

berak · July 6, 2021, 6:40am

well, you would actually need to DO this, not only draw rectangles …

in the end, your approach seems far too brittle.
opencv has some methods to detect text, ranging from simple MSER to deep neuronal networks, it’s probably better to look at that for your problem

Snow · July 9, 2021, 1:33am

Hi Berak,

Really appreciate your answer. I’m a complete ignoramus to Opencv. Could you name some methods or theories which I can research for please? Thank you very much.

Topic		Replies	Views
Image difference after Image registration and alignment Python	5	2515	October 27, 2022
Object correspondance between 2 pictures Python imgproc	0	317	May 31, 2022
How to improve the performance of image matching Python	5	786	November 9, 2022
Pythonlanguagecode Python programming	3	403	March 24, 2022
Binarizing images of similar background and object color(Threshold)? Python imgproc	1	523	March 7, 2022

How to skip differences in the image and recognize the identical part as the same

Related topics