Need help with Matrix calculations - dynamic grasp point for robot

Hi guys,

I need your help in figuring out how to add a dynamic grasp point given a camera pose of an object for a robotic operation

Assume I have a calibrated camera to robot setup

I have the camera pose of an object stored in in a 4 x 4 matrix (rotation and translation with respect to the camera of an object). Before the robot picks the object I need to report the pose of the object with respect to the camera frame coordinate system, but I need to apply an offset in the local coordinate system before sending the pose to the robot.

So for example, lets say I have found the pose of an object with respect to the camera coordinate system. Now lets assume I need to apply translation of 50mm along the x axis in the local object pose coordinate system (not the camera coordinate system), and then recalculate the camera pose so I can pass to the robot along with the calibration of the camera to the robot (calculated earlier). How can I achieve this?

So in summary, every time I get a pose in the camera of the object I want the reported pose of the object through the camera to take into account this local offset of 50mm in the x direction of the local object coordinate system.

I hope this makes sense!

you multiply that translation matrix from the right onto the object-to-camera frame transformation matrix. the transformation goes:

camera frame <- object frame <- target frame

Thanks for the inputs. Still having some problems with this approach. Possibly the implementation?

OK so the code I have implemented is the following

cv::Mat Pose::CalculateRobotPose(cv::Mat calibrationMat, cv::Mat offsetMat, cv::Mat cameraPose)
Mat r = (calibrationMat * offsetMat * cameraPose);
return r.clone();

calibrationMat is the robot to camera calibration file and this is correct (example below)
offsetMat is the offset that I would like to change the pick point in (example below)
cameraPose is the pose of the object without any offset applied as viewed through the camera (example below)

So what seems to be happening is that it’s not taking into account the rotation of the object when I apply the offset in x. So if a put an offset of 50 in the x direction part of the 4 x 4 translation like below its’ always moving the overall robot translation to 50 along the robot axis. If I rotate the object by 90 for example the translation is not changing and gives the same result. I need the translation to move with the rotation of the object so that I always move the robot to the same point on the object.

Here are my 4 x 4 matrix for each of the above

calibrationMat Robot to camera calibration:
-0.0080938 -0.999588 -0.0275391 -0.0499879
-0.9998 0.00758635 0.0184814 -0.565342
-0.0182649 0.0276832 -0.99945 0.801481
0. 0. 0. 1

cameraPose (pose of object when viewed through the camera)
-0.67101473, -0.009799351, -0.74137926, 0.032839321;
-0.7407226, -0.035238907, 0.67088616, 0.030549031;
-0.032699645, 0.99933088, 0.016387224, 0.79935551;
0, 0, 0, 1

offsetMat (what offset we want to offset the picking point on the local object)
1 0. 0. 0.05
0. 1 0. 0.
0. 0. 1 0.
0. 0. 0. 1.

I believe the order of your matrix product is the issue.

given a matrix M that maps vectors in some frame A into another frame B,

v’ = M v

maps v from A into B.

the right side of the matrix is the “input”, the left side is the “output”.

if you want to chain transformations from frame A to B to C, you multiply

v’ = M₂ M₁ v

which maps v from A to B, then from B to C.

you need to put the proposed shift before (to the right of) the object-to-camera transformation.

further, if you have transformations defining robot-to-camera ᶜMᵣ, and object-to-camera ᶜMₒ, and you wanted object-to-robot ʳMₒ, you need to invert one of the matrices, because you need ʳM꜀ which is (ᶜMᵣ)⁻¹

vᵣ = (ʳMₒ) vₒ
vᵣ = (ʳM꜀ ᶜMₒ) vₒ
vᵣ = ((ᶜMᵣ)⁻¹ ᶜMₒ) vₒ

the relationship can be illustrated as a triangle between object, camera, and robot. the camera is the common point, and you already have transformations from either to camera. if you need to transform from camera frame to another frame, you’ll need an inverted matrix for that leg.

when you have ᵒMₜ specifying the x+50mm shift that translates from target frame to object frame, your whole transformation is this:

vᵣ = ((ᶜMᵣ)⁻¹ ᶜMₒ ᵒMₜ) vₜ

Ok nearly there!! - it’s going to the right point now but the gripper is not going to the right orientation around the point any more

so what i did is…

I moved the offset that I wanted to the rhs like this

cv::Mat Pose::CalculateRobotPose(cv::Mat calibrationMat, cv::Mat offsetMat, cv::Mat cameraPose)

Mat r = (calibrationMat * (cameraPose * **offsetMat** ));

return r.clone();


I also adjusted the offset matrix so that the rotation part of the cameraPose is is now part of this matrix.

But now the problem is the gripper of the robot is not going to the correct orientation around the new offset point. Is that because I have left the rotation in the Camera Pose mat as well as the object pose mat rotation -

Should the cameraPose have all 0’s for the 3x3 rotation part and the only place there should be a rotation is in the object pose?

My head is wrecked but I think we are close!!

Sorry should have said an identity matrix for the rotation part of the camera pose - not all 0’s

no, do not just null the rotation part. that will cause the transformation to ignore input positions and simply output its own translation.

the new offset point should only have moved the gripper’s target pose (compared to without the offset), not added any rotation to it. if that was not the observed behavior, something else is going on.

if you need additional rotation, yes, add it there.

Ok - I actually tried it and you were correct - that didn’t work

So where I am now is that I can go to the correct position but somehow I have managed to mess up how the gripper aligns to the new offset point. It’s rotating ‘randomly’ by a certain amount so i’m assuming it’s related in some way to obj rotation is and how it’s placed - just not sure what part of the maths is causing this to happen or what I need to add / take away to get it all to work.

Like I said I didn’t have this issue previously when I didn’t apply an object offset so I’m a little confused as to what to try next to try and resolve.

to clarify: you use calibrationMat as if it defines camera-to-robot pose. is that the right sense? you said previously that this matrix was robot-to-camera.

what values for your matrices do you get now? does at least the translation part of each line up with physical observations?

does that robot have any documentation you can share?

if calibrationMat * cameraPose moves the manipulator correctly, multiplying a pure translation onto the right side shouldn’t make it add rotation at all…

you could try adding/inserting various matrices in that product and tweaking each slightly, a little translation, a little rotation,… see what each does.

Ok hope this clarifies

calibrationMat hold the calibration between the camera and the robot base . -its’ basically the base to camera 4x4 matrix I believe - am I describing it incorrectly? There is not much translation in this matric - mostly translation - I have it down below

the new pose we send to the robot is correct now for translation. It’s the gripper rotation that is wrong now as a result (originally correct when I was going to the centre of the object without the offsetMat applied

The robot is a UR5 with a shunk two finger gripper

In relation to your question

if calibrationMat * cameraPose moves the manipulator correctly, multiplying a pure translation onto the right side shouldn’t make it add rotation at all…

I am adding both a rotation and a translation -not just translation. The translation is the amount I want to move in translation terms but the rotation is basically the rotation 3x3 part of camerPose. If I only added translation it wasn’t moving in relation to the object only in relation to the cameraPose

Here are real world number I am getting

0.33009827, -0.94358641, -0.026074145, 0.01672394;
0.94252592, 0.33099061, -0.045718044, 0.023499032;
0.051769223, -0.0094841085, 0.99861401, 0.79964912;
0, 0, 0, 1

0.33009827, -0.94358641, -0.026074145, 0.0436;
0.94252592, 0.33099061, -0.045718044, -0.0285;
0.051769223, -0.0094841085, 0.99861401, 0;
0, 0, 0, 1

-0.0080938 -0.999588 -0.0275391 -0.0499879
-0.9998 0.00758635 0.0184814 -0.565342
-0.0182649 0.0276832 -0.99945 0.801481
0. 0. 0. 1.

we should come to agree on some conventions. correct me if you think your situation uses different conventions.

  • everything right-handed
  • camera/picture: x right, y “down”, z outwards into scene (into screen).
  • robot base/table/world: x right, y away, z up.
  • let’s say it’s like a desk.

calibrationMat hold the calibration between the camera and the robot base . -its’ basically the base to camera 4x4 matrix I believe - am I describing it incorrectly?

unclear anyway. “between” doesn’t tell the direction of the transformation, which is vital. base-to-camera describes the situation precisely.

I compare that matrix to its inverse… almost the same, except translation x and y are swapped. if you use that matrix wrongly, you will get slightly wrong results, instead of catastrophically wrong results, which makes wrong use easy to miss. I believe that is the case here. you used the matrix in the wrong sense. you need matrix inversion. more on that below.

I’ll assume you arrived at both base-to-cam and object-to-cam matrices by measuring patterns using the camera.

>>> print(base2cam) # or cam_base, (cam <- base)
[[-0.00809 -0.99959 -0.02754 -0.04999]
 [-0.9998   0.00759  0.01848 -0.56534]
 [-0.01826  0.02768 -0.99945  0.80148]
 [ 0.       0.       0.       1.     ]]
>>> print(np.linalg.inv(base2cam)) # (base <- cam)
[[-0.00809 -0.9998  -0.01826 -0.551  ]
 [-0.99959  0.00759  0.02768 -0.06787]
 [-0.02754  0.01848 -0.99945  0.81011]
 [ 0.       0.       0.       1.     ]]

calibrationMat (base-to-cam) looks like… the camera sits 0.81 m above the table, 0.55m to the left of the origin, facing down, and the camera’s right side (+x, first column) is facing towards you (base -y).

is all that correct?

>>> print(obj2cam) # or cam_obj, (cam <- obj)
[[-0.67101 -0.0098  -0.74138  0.03284]
 [-0.74072 -0.03524  0.67089  0.03055]
 [-0.0327   0.99933  0.01639  0.79936]
 [ 0.       0.       0.       1.     ]]

object sits 0.8m away from camera and pretty close to the optical axis (~4 cm away), so likely on the table’s surface.

object has some rotation, around 45 degrees? object’s +x points to the top left (in camera frame), y pointing away from the camera, i.e. downwards into the table, z pointing to the bottom left in camera frame.

>>> obj2base = np.linalg.inv(base2cam) @ obj2cam # (base <- cam) * (cam <- obj)
>>> print(obj2base) # (base <- obj)
[[ 0.7466   0.01706 -0.66505 -0.5964 ]
 [ 0.66421  0.03719  0.74662 -0.07833]
 [ 0.03747 -0.99916  0.01644  0.01086]
 [ 0.       0.       0.       1.     ]]

this is object pose in base frame (obj to base). it sits on the table (z+0.01086), at 60 cm to the left and ~8 cm towards you. rotation agrees, +x goes to away and to the right on the table, +y into the table, +z away and to the left. that agrees with the view of the camera onto the table.

your formulation for offsetMat confuses me greatly. it appears to be the product of a simple offset and cameraPose however… let’s not do that but build transformations up by parts.

ok so you need x+0.05 on the object. let’s say that’s the target. so you have

>>> print(target2obj) # (obj <- target)
[[1.   0.   0.   0.05]
 [0.   1.   0.   0.  ]
 [0.   0.   1.   0.  ]
 [0.   0.   0.   1.  ]]

you have obj2base from above. you want target2base. just multiply the offset from the right:

>>> target2base = obj2base @ target2obj # (base <- obj) * (obj <- target)
>>> print(target2base) # (base <- target)
[[ 0.7466   0.01706 -0.66505 -0.55907]
 [ 0.66421  0.03719  0.74662 -0.04512]
 [ 0.03747 -0.99916  0.01644  0.01273]
 [ 0.       0.       0.       1.     ]]

compare to obj2base, which should be ~5 cm away from these coordinates, except the object sits at an angle, so it’s a little hard to see. we shifted from [-0.5964, -0.07833] to [-0.55907, -0.04512] on the table. the difference is [-0.03733, -0.03321], or the desired 50 mm.

I believe that matrix is what you need.

my variables could be chosen to reflect matrix multiplication better. one could say

>>> base_target = base_obj @ obj_target

where the first part (before _) is the reference frame (“output” frame) and the second part is the described frame (or “input” frame).

OK let me digest all this and I’ll reply properly. From briefly looking at you reply your assumptions & statements are correct

(super reply btw - really appreciated)

These conventions are what I would expect so we’ll assume these going forward

I agree - my understanding is what I have is base-to-camera. Would you agree?

I think I am using it correctly as x and y coordinated are very different x is -0.049 and y is -0.565. Would you agree? (I’m assuming if x & y was swapped i would see a major offset not small but maybe I’m picking up on wrong)


Correct in it being 0.8m away 4cm away sitting on a table. Not 100% sure on rotation as I moved the object since but i think you are correct

I’d agree - I thought I could apply this offset in a simpler manner so lets brake it down into smaller parts as I seem to have sorted the offset at the expense of rotation so doing something stupid here

Yeah - makes sense

OK I think I see what you are doing but bare with me if I have it wrong!

So the inverse of base2cam matrix is basically obj2base which is basically going from the object to the base of the robot - so basically the reverse? Is that correct?

I then take obj2base and multiply it by the offset and get target2base

If I get the inverse of target2base and send that to the robot in theory I should get the robot to go to the correct position? Is that correct?

uh, no, those aren’t related. inverting the matrix for (cam<-base) gives you a matrix for (base<-cam). neither of these involves the object. it’s not even there conceptually, just the robot/base and the camera.

calculating (base<-obj) requires going “around the corner” that is the camera. the camera is the common frame by virtue of being able to calibrate object and base poses. you have transformations between either leg and the camera. to go between legs (object <> base), you’ll compose the individual transformations. it’s somewhat like adding and subtracting vectors on a triangle, except multiplying and inverting matrices.

(base<-obj) = (base<-cam) * (cam<-obj)

you initially have (cam<-base) and get (base<-cam) by inverting it.

here’s a crude sketch:

I’m not sure what the robot wants. the robot probably wants the target-to-base matrix as is, because that describes where the target is relative to the robot’s base frame. i gather that from what has worked for you so far.

if you invert the target-to-base matrix, you get the base-to-target matrix, i.e. something that describes the base’s origin relative to the object/target. that should be a somewhat useless matrix I think, at least to the robot.

OK so I managed to get this all working with your help!!!. Really appreciate the help & your patience with all this.

I had the following problems

My naming convention was totally wrong for the calibrationMat. What I gave you was camera2base but I previously said it was base2camera. I didn’t realise that camera2base means in relation to the robot coordinate frame. By following your logic and breaking down each each transformation I could get it to work and also understand what each section was doing

Just on naming conventions is there a good tutorial / explanation on all this that you would recommend. For example when you write the following

it looks like there are 3 different ways of saying the same thing which I am not familiar with
1: obj2cam
2: cam_obj,
3: (cam <- obj)

Thanks again!!

the most established notation I know is for matrices, with “indices” on the right and left side, as shown. those variable names and comments were my ad hoc attempts to approximate that.

Many thanks again. Great answers and explanation. Take care