Dynamic Preprocessing for Captcha Image Segmentation

azooz4 · March 16, 2025, 12:28pm

Problem Description:
I am working on automating the solution for a specific type of captcha. The captcha consists of a header image that always contains four words, and I need to segment these words accurately. My current challenge is in preprocessing the header image so that it works correctly across all images without manual parameter tuning.

Details:

• Header Image: The width of the header image varies but its height is always 24px.
• The header image always contains four words.

Goal:
The goal is to detect the correct positions for splitting the header image into four words by identifying gaps between the words. However, the preprocessing steps are not consistently effective across different images.

Current Approach:
Here is my current code for preprocessing and segmenting the header image:

import numpy as np
import cv2

image_paths = [
    "C:/path/to/images/antibot_header_1/header_antibot_img.png",
    "C:/path/to/images/antibot_header_181/header_antibot_img.png",
    "C:/path/to/images/antibot_header_3/header_antibot_img.png",
    "C:/path/to/images/antibot_header_4/header_antibot_img.png",
    "C:/path/to/images/antibot_header_5/header_antibot_img.png"
]

for image_path in image_paths:
    gray = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)

    # Apply adaptive threshold for better binarization on different images
    thresh = cv2.adaptiveThreshold(gray, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
                                   cv2.THRESH_BINARY, 199,
                                   0)  # blockSize=255 , C=2,  most fit 201 , 191 for first two images

    # Apply median blur to smooth noise
    blurred_image = cv2.medianBlur(thresh, 9)  # most fit 9 or 11

    # Optional dilation
    kernel_size = 2  # most fit 2 #
    kernel = np.ones((kernel_size, 3), np.uint8)
    blurred_image = dilated = cv2.dilate(blurred_image, kernel, iterations=3)

    # Morphological opening to remove small noise
    kernel_size = 3  # most fit 2  # 6
    kernel = np.ones((kernel_size, kernel_size), np.uint8)
    opening = cv2.morphologyEx(blurred_image, cv2.MORPH_RECT, kernel, iterations=3)  # most fit 3

    # Dilate to make text regions more solid and rectangular
    dilated = cv2.dilate(opening, kernel, iterations=1)

    # Find contours and draw bounding rectangles on a mask
    contours, _ = cv2.findContours(dilated, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    word_mask = np.zeros_like(dilated)

    for contour in contours:
        x, y, w, h = cv2.boundingRect(contour)
        cv2.rectangle(word_mask, (x, y), (x + w, y + h), 255, thickness=cv2.FILLED)

    name = image_path.replace("C:/path/to/images/", "").replace("/header_antibot_img.png", "")
    cv2.imshow(name, gray)
    cv2.imshow("Thresholded", thresh)
    cv2.imshow("Blurred", blurred_image)
    cv2.imshow("Opening (Noise Removed)", opening)
    cv2.imshow("Dilated (Text Merged)", dilated)
    cv2.imshow("Final Word Rectangles", word_mask)
    cv2.waitKey(0)
cv2.destroyAllWindows()

Issue:
The parameters used in the preprocessing steps (e.g., blockSize, C in adaptive thresholding, kernel sizes) need to be manually adjusted for each set of images to achieve accurate segmentation. This makes the solution non-dynamic and unreliable for new images.

Question:
How can I dynamically preprocess the header image so that the segmentation works correctly across all images without needing to manually adjust parameters? Are there any techniques or algorithms that can automatically determine the best preprocessing parameters based on the image content?

Additional Notes:

• The width of the header image changes every time, but its height is always 24px.
• The header image always contains four words.
• All images are in PNG format.
• I know how to split the image based on black pixel density once the preprocessing is done correctly.

Sample of images used in this code:
Below are examples of header images used in the code. Each image contains four words, but the preprocessing parameters need to be adjusted manually for accurate segmentation.

Image 1

Image 2

Image 3

Image 4

Image 5

Output Sample:
antibot_header_1

crackwitz · March 16, 2025, 5:38pm

crosspost:

azooz4 · March 16, 2025, 9:11pm

Sorry, I didn’t know about this.
The first place I posted my question was StackOverflow.
Should I delete my post from here?

crackwitz · March 17, 2025, 4:30pm

I don’t care beyond the presence of the crosslinks.

as for the captchas: you’ll need AI/DL, and you’ll likely have to train your own model. it’s basically “challenging” OCR. OCR should not be performed by hacking the picture into pieces (glyphs). no preprocessing of any kind. let the model run on the entire picture at once.

Topic		Replies	Views
I'm trying to solve a captcha image I downloaded by editing a code on github Python programming	5	71	July 27, 2024
How to detect the location of a grid of letters in an image with openCV and Python? Python imgproc , programming	1	2495	August 11, 2022
Live stream object detection videoio , imgproc , objdetect , game-automation	9	1472	January 20, 2021
Pytesseract identifies "q" as "a" and "i" as "I" Python ocr , tesseract , captcha	1	420	April 14, 2023
Obtain only text and numbers Python ocr , imgproc , text , tesseract	0	915	April 10, 2021

Dynamic Preprocessing for Captcha Image Segmentation

Related topics