# [Q] Translate a set of coordinates into the same translated image space?

Hello Everyone,
I applied a random translation of up to 20 % of the original image size to an image, using a translation matrix T `T = [1, 0, t_x], [0, 1, t_y]`, where `t_y` and `t_y` are sampled from a uniform with an upper and lower bound corresponding to 20% of the orignal image size. I use this matrix to perform a translation transformation on a square image of size `N by N by 3`.
As I am dealing with an object detection problem, each image comes with a set of annotation that define the location of the bounding box midpoint as x, y corodiantes for the bounding box location within the image plane and the height and width of the bounding box.
Since a translation is a shift of up to a constant I figured that I could simply add `t_x` and
`t_y` to the correspoinding `x, y coordinate` annotations to obtain the leocation of these coordinates in the translated image space. Unfortunately, this did not work. Therefore, I was wondering how I can translate a set of points i.e., ` [x=0.6 , y=0.3]` so that they will be in the correct location in the translated image as well. How could I achieve this?

Please find my code snipped for the image and point translation below:

``````import numpy as np
import cv2 as cv

height, width = image.shape[:2]
translate_upper_bound = float(height * factor/100)
translate_lower_bound = float(height * factor/100) * -1

# uniform vals to translate into x coord t_x and y coord t_y
t_x = np.random.uniform(low=translate_lower_bound, high=translate_upper_bound)
t_y = np.random.uniform(low=translate_lower_bound, high=translate_upper_bound)

# Translation matrix T
T = np.float32([[1, 0, t_x], [0, 1, t_y]])
img_translation = cv.warpAffine(image, T, (width, height))
``````

and the exact problem is, now ?

did you add/subtract when you should have subtracted/added? don’t just say “didn’t work”. show what did happen… and how you did it.

How to transform a set of x, y coordinates into the correspoding translated image plane.

you need to multiply with image size, before adding the translation:

``````p.x = p.x * width + t_x
p.y = p.y * height + t_y
``````

then, note, that you can only use the same upper / lower bounds for quadratic images

Sure, I tried adding t_x and t_y to the respective x and y coordiantes and also substracting them. I did not change the signs, which I also dont believe matters. Since t_x and t_y perform the correspoding change depending on their sign. Whether I do c + (-2) or c - 2, would not matter.

Result when performing x + t_x, y + t_y

Result when performing x - t_x, y - t_y

in your code, there is no `factor` defined.

show me the values you get at every line of your code. don’t just show a plot. in the second subplots, I can’t even see the rectangle, if there was supposed to be one.

Sure, please find the entire code snipped below. You wont be able to see a boundig box in the second image, because the translation offsets them out of the image or makes them so large that they are outside the image plane.

``````def random_translation(image_path, bounding_box, factor = 20):
colors = [[255, 0, 0], [0, 255, 0], [0, 0, 255],
[255, 255, 0], [255, 127, 127], [255, 165, 0],
[255, 105, 180], [64, 224, 208], [134, 1, 175],
[127, 127, 127], [116, 86, 74], [0, 0, 0],
[128, 0, 128], [0, 128, 128], [128, 0, 0],
[0, 255, 255], [128, 128, 128], [255, 0, 255],
[0, 0, 128], [255, 105, 180], [128, 128, 0]]

class_names = ["aeroplane","bicycle","bird","boat","bottle","bus","car",
"cat","chair","cow","diningtable","dog","horse","motorbike","person",
"pottedplant","sheep","sofa","train","tvmonitor"]

height, width = image.shape[:2]

# 0. Translation Variables
translate_upper_bound = float(height / 100 * factor)
translate_lower_bound = float(height / 100 * factor) * -1

# uniform vals to translate into x coord t_x and y coord t_y
t_x = np.random.uniform(low=translate_lower_bound, high=translate_upper_bound)
t_y = np.random.uniform(low=translate_lower_bound, high=translate_upper_bound)

# Translation matrix T
T = np.float32([[1, 0, t_x], [0, 1, t_y]])
img_translation = cv.warpAffine(image, T, (width, height))

# 1. Original Image and Bounding Box
height, width, _  = image.shape

class_pred = int(bounding_box)
bounding_box = bounding_box[1:]
assert len(bounding_box) == 4, "Bounding box prediction exceed x,y,w,h."
# extract x midpoint, y midpoint, w width and h height
x = bounding_box
y = bounding_box
w = bounding_box
h = bounding_box
l = int((x - w / 2) * width)
r = int((x + w / 2) * width)
t = int((y - h / 2) * height)
b = int((y + h / 2) * height)

if l < 0:
l = 0
if r > width - 1:
r = width - 1
if t < 0:
t = 0
if b > height - 1:
b = height - 1

image = cv.rectangle(image, (l, t), (r, b), colors[class_pred], 2)
(width, height), _ = cv.getTextSize(class_names[class_pred], cv.FONT_HERSHEY_SIMPLEX, 0.6, 2)

image = cv.rectangle(image, (l, t + 20), (l + width, t), colors[class_pred], -1)
image = cv.putText(image, class_names[class_pred], (l, t + 15),
cv.FONT_HERSHEY_SIMPLEX, 0.6, [255, 255, 255], 2)

#2. Translated Image and Bounding Box

height, width, _  = img_translation.shape
x = bounding_box
y = bounding_box
w = bounding_box
h = bounding_box

# this works but is incorrect, as it doesnt transform the annotated x,y labels
# it draws the box where it would be and then shifts the entire bounding box
#l = int((x - w / 2) * width + t_x)
#r = int((x + w / 2) * width + t_x)
#t = int((y - h / 2) * height + t_y)
#b = int((y + h / 2) * height + t_y)
l = int((x - w / 2) * width)
r = int((x + w / 2) * width)
t = int((y - h / 2) * height)
b = int((y + h / 2) * height)
if l < 0:
l = 0
if r > width - 1:
r = width - 1
if t < 0:
t = 0
if b > height - 1:
b = height - 1

img_translation = cv.rectangle(img_translation, (l, t), (r, b), colors[class_pred], 2)
(width, height), _ = cv.getTextSize(class_names[class_pred], cv.FONT_HERSHEY_SIMPLEX, 0.6, 2)

img_translation = cv.rectangle(img_translation, (l, t + 20), (l + width, t), colors[class_pred], -1)
img_translation = cv.putText(img_translation, class_names[class_pred], (l, t + 15),
cv.FONT_HERSHEY_SIMPLEX, 0.6, [255, 255, 255], 2)

# 3. Plot results
plt.subplot(1,2,1)
plt.imshow(image)
plt.subplot(1,2,2)
plt.imshow(img_translation)

bbox = [11, 0.34419263456090654, 0.611, 0.4164305949008499, 0.262]
random_translation(image_path = 'C:/Users/username/anaconda3/envs/yolo/yolo/data/images/000001.jpg', bounding_box = bbox, factor= 20)
``````

p.x = p.x * width + t_x
p.y = p.y * height + t_y

Results in:

My code is below. I would be happy if you could have a look. In the comments there is a method that gives the correct result visually, but is not correct as it does not transform the x, y coordinates but just shifts the resulting bounding box. The idea is to use translation as a data augmentation technique, which augments both input images and corresponding x, y annotations so that the neural network does not simpy memorize the input data and the corresponding annotations.

``````def random_translation(image_path, bounding_box, factor = 20):
colors = [[255, 0, 0], [0, 255, 0], [0, 0, 255],
[255, 255, 0], [255, 127, 127], [255, 165, 0],
[255, 105, 180], [64, 224, 208], [134, 1, 175],
[127, 127, 127], [116, 86, 74], [0, 0, 0],
[128, 0, 128], [0, 128, 128], [128, 0, 0],
[0, 255, 255], [128, 128, 128], [255, 0, 255],
[0, 0, 128], [255, 105, 180], [128, 128, 0]]

class_names = ["aeroplane","bicycle","bird","boat","bottle","bus","car",
"cat","chair","cow","diningtable","dog","horse","motorbike","person",
"pottedplant","sheep","sofa","train","tvmonitor"]

height, width = image.shape[:2]

# 0. Translation Variables
translate_upper_bound = float(height / 100 * factor)
translate_lower_bound = float(height / 100 * factor) * -1

# uniform vals to translate into x coord t_x and y coord t_y
t_x = np.random.uniform(low=translate_lower_bound, high=translate_upper_bound)
t_y = np.random.uniform(low=translate_lower_bound, high=translate_upper_bound)

# Translation matrix T
T = np.float32([[1, 0, t_x], [0, 1, t_y]])
img_translation = cv.warpAffine(image, T, (width, height))

# 1. Original Image and Bounding Box
height, width, _  = image.shape

class_pred = int(bounding_box)
bounding_box = bounding_box[1:]
assert len(bounding_box) == 4, "Bounding box prediction exceed x,y,w,h."
# extract x midpoint, y midpoint, w width and h height
x = bounding_box
y = bounding_box
w = bounding_box
h = bounding_box
l = int((x - w / 2) * width)
r = int((x + w / 2) * width)
t = int((y - h / 2) * height)
b = int((y + h / 2) * height)

if l < 0:
l = 0
if r > width - 1:
r = width - 1
if t < 0:
t = 0
if b > height - 1:
b = height - 1

image = cv.rectangle(image, (l, t), (r, b), colors[class_pred], 2)
(width, height), _ = cv.getTextSize(class_names[class_pred], cv.FONT_HERSHEY_SIMPLEX, 0.6, 2)

image = cv.rectangle(image, (l, t + 20), (l + width, t), colors[class_pred], -1)
image = cv.putText(image, class_names[class_pred], (l, t + 15),
cv.FONT_HERSHEY_SIMPLEX, 0.6, [255, 255, 255], 2)

#2. Translated Image and Bounding Box

height, width, _  = img_translation.shape
x = bounding_box * width + t_x
y = bounding_box * height + t_y
w = bounding_box
h = bounding_box

# this works but is incorrect, as it doesnt transform the annotated x, y labels
# it draws the box where it would be and then shifts the entire bounding box
#l = int((x - w / 2) * width + t_x)
#r = int((x + w / 2) * width + t_x)
#t = int((y - h / 2) * height + t_y)
#b = int((y + h / 2) * height + t_y)
l = int((x - w / 2) * width)
r = int((x + w / 2) * width)
t = int((y - h / 2) * height)
b = int((y + h / 2) * height)
if l < 0:
l = 0
if r > width - 1:
r = width - 1
if t < 0:
t = 0
if b > height - 1:
b = height - 1

img_translation = cv.rectangle(img_translation, (l, t), (r, b), colors[class_pred], 2)
(width, height), _ = cv.getTextSize(class_names[class_pred], cv.FONT_HERSHEY_SIMPLEX, 0.6, 2)

img_translation = cv.rectangle(img_translation, (l, t + 20), (l + width, t), colors[class_pred], -1)
img_translation = cv.putText(img_translation, class_names[class_pred], (l, t + 15),
cv.FONT_HERSHEY_SIMPLEX, 0.6, [255, 255, 255], 2)

# 3. Plot Results
plt.subplot(1,2,1)
plt.imshow(image)
plt.subplot(1,2,2)
plt.imshow(img_translation)

bbox = [11, 0.34419263456090654, 0.611, 0.4164305949008499, 0.262]
random_translation(image_path = 'C:/Users/username/anaconda3/envs/yolo/yolo/data/images/000001.jpg', bounding_box = bbox, factor= 20)
``````

Scaling by height and width was indeed the correct approach.
Solution was to use divison instead of multiplication. So it was a mistake up to a constant scale. Thank you.
Question can be marked as solved.