I see that the dataset description talks about a rotation matrix and translation vector for the cameras taking the images. I can’t seem to find any information where the ground truth poses are for each object in each image. Are we meant to derive them from other, existing information?
Excuse my ignorance, I’ve never done 6dof pose estimation before. This may be well understood by others.