[Q] Translate a set of coordinates into the same translated image space?

Hello Everyone,
I applied a random translation of up to 20 % of the original image size to an image, using a translation matrix T T = [1, 0, t_x], [0, 1, t_y], where t_y and t_y are sampled from a uniform with an upper and lower bound corresponding to 20% of the orignal image size. I use this matrix to perform a translation transformation on a square image of size N by N by 3.
As I am dealing with an object detection problem, each image comes with a set of annotation that define the location of the bounding box midpoint as x, y corodiantes for the bounding box location within the image plane and the height and width of the bounding box.
Since a translation is a shift of up to a constant I figured that I could simply add t_x and
t_y to the correspoinding x, y coordinate annotations to obtain the leocation of these coordinates in the translated image space. Unfortunately, this did not work. Therefore, I was wondering how I can translate a set of points i.e., [x=0.6 , y=0.3] so that they will be in the correct location in the translated image as well. How could I achieve this?

Please find my code snipped for the image and point translation below:

import numpy as np
import cv2 as cv

image = cv.imread(image_path)
height, width = image.shape[:2]
translate_upper_bound = float(height * factor/100)
translate_lower_bound = float(height * factor/100) * -1
    
# uniform vals to translate into x coord t_x and y coord t_y
t_x = np.random.uniform(low=translate_lower_bound, high=translate_upper_bound)
t_y = np.random.uniform(low=translate_lower_bound, high=translate_upper_bound)
    
# Translation matrix T
T = np.float32([[1, 0, t_x], [0, 1, t_y]])
img_translation = cv.warpAffine(image, T, (width, height))

and the exact problem is, now ?

did you add/subtract when you should have subtracted/added? don’t just say “didn’t work”. show what did happen… and how you did it.

How to transform a set of x, y coordinates into the correspoding translated image plane.

you need to multiply with image size, before adding the translation:

p.x = p.x * width + t_x
p.y = p.y * height + t_y

then, note, that you can only use the same upper / lower bounds for quadratic images

Sure, I tried adding t_x and t_y to the respective x and y coordiantes and also substracting them. I did not change the signs, which I also dont believe matters. Since t_x and t_y perform the correspoding change depending on their sign. Whether I do c + (-2) or c - 2, would not matter.

Result when performing x + t_x, y + t_y

Result when performing x - t_x, y - t_y

in your code, there is no factor defined.

show me the values you get at every line of your code. don’t just show a plot. in the second subplots, I can’t even see the rectangle, if there was supposed to be one.

Sure, please find the entire code snipped below. You wont be able to see a boundig box in the second image, because the translation offsets them out of the image or makes them so large that they are outside the image plane.

def random_translation(image_path, bounding_box, factor = 20):
    colors = [[255, 0, 0], [0, 255, 0], [0, 0, 255],
              [255, 255, 0], [255, 127, 127], [255, 165, 0],
              [255, 105, 180], [64, 224, 208], [134, 1, 175],
              [127, 127, 127], [116, 86, 74], [0, 0, 0],
              [128, 0, 128], [0, 128, 128], [128, 0, 0],
              [0, 255, 255], [128, 128, 128], [255, 0, 255],
              [0, 0, 128], [255, 105, 180], [128, 128, 0]]
    
    class_names = ["aeroplane","bicycle","bird","boat","bottle","bus","car",
        "cat","chair","cow","diningtable","dog","horse","motorbike","person",
        "pottedplant","sheep","sofa","train","tvmonitor"]
    
    image = cv.imread(image_path)
    height, width = image.shape[:2]
    
    # 0. Translation Variables
    translate_upper_bound = float(height / 100 * factor)
    translate_lower_bound = float(height / 100 * factor) * -1
    
    # uniform vals to translate into x coord t_x and y coord t_y
    t_x = np.random.uniform(low=translate_lower_bound, high=translate_upper_bound)
    t_y = np.random.uniform(low=translate_lower_bound, high=translate_upper_bound)
    
    # Translation matrix T
    T = np.float32([[1, 0, t_x], [0, 1, t_y]])
    img_translation = cv.warpAffine(image, T, (width, height)) 
    
    # 1. Original Image and Bounding Box
    height, width, _  = image.shape
    
    class_pred = int(bounding_box[0])
    bounding_box = bounding_box[1:]
    assert len(bounding_box) == 4, "Bounding box prediction exceed x,y,w,h."
    # extract x midpoint, y midpoint, w width and h height
    x = bounding_box[0] 
    y = bounding_box[1]
    w = bounding_box[2]
    h = bounding_box[3]
    l = int((x - w / 2) * width) 
    r = int((x + w / 2) * width)
    t = int((y - h / 2) * height)
    b = int((y + h / 2) * height)
    
    if l < 0:
        l = 0
    if r > width - 1:
        r = width - 1
    if t < 0:
        t = 0
    if b > height - 1:
        b = height - 1

    image = cv.rectangle(image, (l, t), (r, b), colors[class_pred], 2)
    (width, height), _ = cv.getTextSize(class_names[class_pred], cv.FONT_HERSHEY_SIMPLEX, 0.6, 2)
         
    image = cv.rectangle(image, (l, t + 20), (l + width, t), colors[class_pred], -1)
    image = cv.putText(image, class_names[class_pred], (l, t + 15),
                      cv.FONT_HERSHEY_SIMPLEX, 0.6, [255, 255, 255], 2)
    
    #2. Translated Image and Bounding Box

    height, width, _  = img_translation.shape
    x = bounding_box[0] 
    y = bounding_box[1]
    w = bounding_box[2] 
    h = bounding_box[3] 
    
    # this works but is incorrect, as it doesnt transform the annotated x,y labels
    # it draws the box where it would be and then shifts the entire bounding box
    #l = int((x - w / 2) * width + t_x)
    #r = int((x + w / 2) * width + t_x)
    #t = int((y - h / 2) * height + t_y)
    #b = int((y + h / 2) * height + t_y)
    l = int((x - w / 2) * width)
    r = int((x + w / 2) * width)
    t = int((y - h / 2) * height)
    b = int((y + h / 2) * height)
    if l < 0:
        l = 0
    if r > width - 1:
        r = width - 1
    if t < 0:
        t = 0
    if b > height - 1:
        b = height - 1

    img_translation = cv.rectangle(img_translation, (l, t), (r, b), colors[class_pred], 2)
    (width, height), _ = cv.getTextSize(class_names[class_pred], cv.FONT_HERSHEY_SIMPLEX, 0.6, 2)
         
    img_translation = cv.rectangle(img_translation, (l, t + 20), (l + width, t), colors[class_pred], -1)
    img_translation = cv.putText(img_translation, class_names[class_pred], (l, t + 15),
                      cv.FONT_HERSHEY_SIMPLEX, 0.6, [255, 255, 255], 2)
    
    # 3. Plot results
    plt.subplot(1,2,1)
    plt.imshow(image)
    plt.subplot(1,2,2)
    plt.imshow(img_translation)

bbox = [11, 0.34419263456090654, 0.611, 0.4164305949008499, 0.262]
random_translation(image_path = 'C:/Users/username/anaconda3/envs/yolo/yolo/data/images/000001.jpg', bounding_box = bbox, factor= 20)

p.x = p.x * width + t_x
p.y = p.y * height + t_y

Results in:

My code is below. I would be happy if you could have a look. In the comments there is a method that gives the correct result visually, but is not correct as it does not transform the x, y coordinates but just shifts the resulting bounding box. The idea is to use translation as a data augmentation technique, which augments both input images and corresponding x, y annotations so that the neural network does not simpy memorize the input data and the corresponding annotations.

def random_translation(image_path, bounding_box, factor = 20):
    colors = [[255, 0, 0], [0, 255, 0], [0, 0, 255],
              [255, 255, 0], [255, 127, 127], [255, 165, 0],
              [255, 105, 180], [64, 224, 208], [134, 1, 175],
              [127, 127, 127], [116, 86, 74], [0, 0, 0],
              [128, 0, 128], [0, 128, 128], [128, 0, 0],
              [0, 255, 255], [128, 128, 128], [255, 0, 255],
              [0, 0, 128], [255, 105, 180], [128, 128, 0]]
    
    class_names = ["aeroplane","bicycle","bird","boat","bottle","bus","car",
        "cat","chair","cow","diningtable","dog","horse","motorbike","person",
        "pottedplant","sheep","sofa","train","tvmonitor"]
    
    image = cv.imread(image_path)
    height, width = image.shape[:2]
    
    # 0. Translation Variables
    translate_upper_bound = float(height / 100 * factor)
    translate_lower_bound = float(height / 100 * factor) * -1
    
    # uniform vals to translate into x coord t_x and y coord t_y
    t_x = np.random.uniform(low=translate_lower_bound, high=translate_upper_bound)
    t_y = np.random.uniform(low=translate_lower_bound, high=translate_upper_bound)
    
    # Translation matrix T
    T = np.float32([[1, 0, t_x], [0, 1, t_y]])
    img_translation = cv.warpAffine(image, T, (width, height)) 
    
    # 1. Original Image and Bounding Box
    height, width, _  = image.shape
    
    class_pred = int(bounding_box[0])
    bounding_box = bounding_box[1:]
    assert len(bounding_box) == 4, "Bounding box prediction exceed x,y,w,h."
    # extract x midpoint, y midpoint, w width and h height
    x = bounding_box[0] 
    y = bounding_box[1]
    w = bounding_box[2]
    h = bounding_box[3]
    l = int((x - w / 2) * width) 
    r = int((x + w / 2) * width)
    t = int((y - h / 2) * height)
    b = int((y + h / 2) * height)
    
    if l < 0:
        l = 0
    if r > width - 1:
        r = width - 1
    if t < 0:
        t = 0
    if b > height - 1:
        b = height - 1

    image = cv.rectangle(image, (l, t), (r, b), colors[class_pred], 2)
    (width, height), _ = cv.getTextSize(class_names[class_pred], cv.FONT_HERSHEY_SIMPLEX, 0.6, 2)
         
    image = cv.rectangle(image, (l, t + 20), (l + width, t), colors[class_pred], -1)
    image = cv.putText(image, class_names[class_pred], (l, t + 15),
                      cv.FONT_HERSHEY_SIMPLEX, 0.6, [255, 255, 255], 2)
    
    #2. Translated Image and Bounding Box

    height, width, _  = img_translation.shape
    x = bounding_box[0] * width + t_x
    y = bounding_box[1] * height + t_y
    w = bounding_box[2] 
    h = bounding_box[3] 
    
    # this works but is incorrect, as it doesnt transform the annotated x, y labels
    # it draws the box where it would be and then shifts the entire bounding box
    #l = int((x - w / 2) * width + t_x)
    #r = int((x + w / 2) * width + t_x)
    #t = int((y - h / 2) * height + t_y)
    #b = int((y + h / 2) * height + t_y)
    l = int((x - w / 2) * width)
    r = int((x + w / 2) * width)
    t = int((y - h / 2) * height)
    b = int((y + h / 2) * height)
    if l < 0:
        l = 0
    if r > width - 1:
        r = width - 1
    if t < 0:
        t = 0
    if b > height - 1:
        b = height - 1

    img_translation = cv.rectangle(img_translation, (l, t), (r, b), colors[class_pred], 2)
    (width, height), _ = cv.getTextSize(class_names[class_pred], cv.FONT_HERSHEY_SIMPLEX, 0.6, 2)
         
    img_translation = cv.rectangle(img_translation, (l, t + 20), (l + width, t), colors[class_pred], -1)
    img_translation = cv.putText(img_translation, class_names[class_pred], (l, t + 15),
                      cv.FONT_HERSHEY_SIMPLEX, 0.6, [255, 255, 255], 2)
    
    # 3. Plot Results
    plt.subplot(1,2,1)
    plt.imshow(image)
    plt.subplot(1,2,2)
    plt.imshow(img_translation)

bbox = [11, 0.34419263456090654, 0.611, 0.4164305949008499, 0.262]
random_translation(image_path = 'C:/Users/username/anaconda3/envs/yolo/yolo/data/images/000001.jpg', bounding_box = bbox, factor= 20)

Scaling by height and width was indeed the correct approach.
Solution was to use divison instead of multiplication. So it was a mistake up to a constant scale. Thank you.
Question can be marked as solved.