Thanks for the reply !
I’d agree that a real-world solution / production grade implementation would directly provide the co-ordinates. That would basically solve everything here.
Unfortunately, that ins’t the case.
Background:
i) It isn’t an academic or corporate issue as it’s from my very own project where i’m trying to experiment with a closed loop AI system.
ii) The bounding box is generated by the AI (object detection model), which I can’t modify to extract co-ordinate as it’s a proprietary system. I’m limited to only receive images from it with bounding boxes on detected objects (in this case, person).
iii) I’d of course love to discuss more regarding the approach. More data is shared below.
Progress so far:
i) From the image with bounding box, detected yellow color and created a mask
ii) Detected the edges on the mask
iii) After computing contours and polygon approximation, I’m able to extract the ROI from source image
This approach is hardly reliable as it isn’t consistent.
Here it works on maybe 1 or 2 of the test images I’ve shared in the link.
Source:
Limited to share only one embedded media here. Code for my current method and test data are in the below link.