Hoping for a code / logic review


I have images of rugs that have had their background removed using photoroom API. But the edges of the rugs are not straight, they are wavy and sqiggly, see image below. I want to use OpenCv to make the rug into a perfect rectangle. This rug was laying flat on the floor and shot form an angle, I have already warped it using cv2.warpPerspective() but that has left me with these wavy edges.


Segmenting these images with photoroom API resulted in the area near the edges of the rug ranging from 0-255 in their alpha values. As the foreground, background segmentation is not perfect, I guess they intelligently play with the transparency values to make it appear to be a transparent background close to the edges. This is challenging to deal with in my current approach as you can see below.

Current Approach

  1. Divide the rug into 3 zones. Left Transition Zone (LTZ), Opaque Zone, and Right Transition Zone (RTZ)
  2. For each row, start checking from left onwards, the first pixel between 0 & 255 becomes the LTZ start
  3. When we encoutner 5 consecutive opaque (255) pixels, we consider it to be the start of opaque zone and end of LTZ
  4. End of opaque zone is start of RTZ, then find the last pixel between 0 & 255 and that becomes the end of RTZ
  5. Append all values in array.
  6. Create new canvas with padding
  7. Calculate size of each opaque row. Find out max opaque width
  8. Copy all LTZ data to new canvas for each row
  9. Add opaque pixels, but spread them by introducing transparent pixels in between so that their width is equal to the max opaque width
  10. Copy all RTZ data

Question to this forum

Is this approach remotelty feasible? I have already spent about a week getting this far and I suppose it will take me another 15 days or so to make it work properly. That is even if I could get it to work.

Is there something else that I can try to get what I want? Maybe better segmentation to remove the complexities associated with 0-255 alpha values. But all the segmentation techniques I tried left me with 0-255 values at the edges.

Secondly, this is only for left and right for now. After this, I will have to repeat for top and bottom.

This function is completely discarding the 'Keep it simple silly" school of thought and I was wondering there must be a better way to do it but I couldn’t find anything via googling and am unable to figure it out on my own. Any guidance will be highly appreciated. Thanks.


Step 7 & Step 9 is currently missing. I tried it (without dividing into LTZ / RTZ zones, just row of all data above 0 alpha) out and it worked but was getting too complicated to handle along with LTZ and RTZ so will be adding this later

import cv2
import numpy as np

# Load the image
image = cv2.imread('2.png', cv2.IMREAD_UNCHANGED)

# Get image dimensions
height, width, channels = image.shape

xArray_left = []
xArray_right = []
xArray_LTZ_start = []
xArray_LTZ_end = []
xArray_RTZ_start = []
xArray_RTZ_end = []
yArray = []
LTZ_size = []
RTZ_size = []

OPAQUE_THRESHOLD = 5  # Number of consecutive opaque pixels to confirm the zone
consecutive_opaque_count = 0  # Counter for consecutive opaque pixels

# Iterate over each row to find the bounds
for y in range(height):
    first_opaque_left = first_opaque_right = -1
    ltz_start = ltz_end = rtz_start = rtz_end = -1
    in_opaque_zone = False
    rtz_active = False  # This flag will track if we are within the RTZ

    for x in range(width):
        alpha = image[y, x][3]
        if alpha > 0 and alpha < 255:
            if not in_opaque_zone:  # In transition zone before or after opaque pixels
                if ltz_start == -1:  # Left Transition Zone hasn't started yet
                    ltz_start = x
                if rtz_active:  # Only update rtz_end if rtz has started
                    rtz_end = x
                if rtz_start == -1 and x > first_opaque_right:  # Start of right transition zone
                    rtz_start = x
                    rtz_active = True
                    rtz_end = x  # Initialize rtz_end
                elif rtz_active:
                    rtz_end = x  # Update rtz_end as long as we are in the rtz

        if alpha == 255:
            consecutive_opaque_count += 1
            if not in_opaque_zone and consecutive_opaque_count >= OPAQUE_THRESHOLD:
                if first_opaque_left == -1:
                    first_opaque_left = x - (OPAQUE_THRESHOLD - 1)
                    if ltz_start != -1:
                        ltz_end = first_opaque_left - 1
                    in_opaque_zone = True
            first_opaque_right = x  # Continue updating this in the opaque zone
            consecutive_opaque_count = 0  # Reset count if not fully opaque

    if in_opaque_zone:  # If we were in an opaque zone, check if rtz can start
        rtz_start = first_opaque_right + 1 if rtz_start == -1 else rtz_start

    if first_opaque_left != -1 and first_opaque_right != -1:
        ltz_size = ltz_end - ltz_start + 1 if ltz_start != -1 and ltz_end != -1 else 0
        rtz_size = rtz_end - rtz_start + 1 if rtz_start != -1 and rtz_end != -1 else 0

# Print the first 10 values of each array
print("First Opaque Left:", xArray_left[:10])
print("First Opaque Right:", xArray_right[:10])
print("LTZ Start:", xArray_LTZ_start[:10])
print("LTZ End:", xArray_LTZ_end[:10])
print("RTZ Star:", xArray_RTZ_start[:50])
print("RTZ End:", xArray_RTZ_end[:10])
print("Y array:", yArray[:10])        
print("LTZ Size:", LTZ_size[:50])
print("RTZ Size:", RTZ_size[:50])

# Calculate the new image dimensions with additional padding
new_width = max(xArray_right[i] - xArray_left[i] for i in range(len(xArray_left))) + 250
new_height = height + 250
new_image = np.zeros((new_height, new_width, channels), dtype=np.uint8)
new_image[:, :, 3] = 0

offset = 125  # Added padding / 2

# Adjust each row to have the new maximum width
for i in range(len(yArray)):
    y = yArray[i] + offset
    x_left = xArray_left[i]
    x_right = xArray_right[i]
    ltz_start = xArray_LTZ_start[i]
    ltz_end = xArray_LTZ_end[i]
    rtz_start = xArray_RTZ_start[i]
    rtz_end = xArray_RTZ_end[i]

    # Copy the LTZ, checking bounds and initialization
    if ltz_start is not None and ltz_end is not None:
        for x in range(ltz_start, ltz_end + 1):
            new_x = offset + x - ltz_start
            if new_x < new_width:  # Ensure within bounds
                new_image[y, new_x] = image[yArray[i], x]

    # Copy the opaque pixels, checking bounds
    start_opaque = offset + (ltz_end - ltz_start + 1) if ltz_end is not None else offset
    for x in range(x_left, x_right + 1):
        new_x = start_opaque + x - x_left
        if new_x < new_width:  # Ensure within bounds
            new_image[y, new_x] = image[yArray[i], x]

    # Copy the RTZ, checking bounds and initialization
    if rtz_start is not None and rtz_end is not None:
        start_rtz = start_opaque + (x_right - x_left + 1)
        for x in range(rtz_start, rtz_end + 1):
            new_x = start_rtz + x - rtz_start
            if new_x < new_width:  # Ensure within bounds
                new_image[y, new_x] = image[yArray[i], x]

# Save the adjusted image
cv2.imwrite('adjusted_image_with_transitions.png', new_image)