Difficulty trying to find the orientation of detected objects using YoloV5

I just noticed that you linked a PDF in your first post.

for several different types, you might want the network to infer an oriented bounding box consisting of

  • center point (2 scalars)
  • orientation of major axis (radians -\pi to +\pi, or \sin and \cos values?)
  • width (minor axis length) at least, so your gripper knows how much to open
  • length optional

(and the encoding of the object classes)

this just requires a change in the last layer and whatever processes the outputs from that layer, and the training data needs adapting to this format.

in case objects touch, the gripper’s jaws would have to reach in and exactly hit the gap, so the items aren’t damaged.

perhaps the ability to shake the box would help. shake and look until at least one object is free-standing. that should also help with clearing up occlusion.