Extracting region with a person - CV approach

What would be a better CV approach here?

The goal: Extract the person or the area within (or with) the bounding box that includes the person.

The object detection isn’t done by OpenCV and there isn’t any access to object co-ordinates within the image. Image with person detected is the only data.

One of the sample images:

My method:

  1. Detect yellow in the input image and create a binary mask - As the image always has a yellow bounding box for the person detected.
  2. Detect edges in the binary mask.
  3. Compute contours on the detected edges and approximate polygon.
  4. Use these values to extract the region from input image.

This approach isn’t reliable as sometimes the computed contours from edge detection fails to confine a polygon. Thus the rectangle bounding box isn’t extracted.

My ask:
1)Suggestions or advice on other methods of approach.
2) Ways to improve current method accuracy.
3) Anything else that helps.


  1. Cant perform person / object detection again.
  2. have to stick with conventional image processing.

a real-world solution would be to figure out who drew those boxes and how to make them give you the coordinates properly.

this sounds like either an academic exercise, where the instructor needs to be told they’re being unreasonable, or it’s some corporate bureaucratic issue where someone mistakenly put “in charge” needs to be told they’re being unreasonable.

if you need to discuss the problem as you’ve posed it, you need to provide proper source data. that highly compressed thumbnail is impossible to experiment on.

Thanks for the reply !
I’d agree that a real-world solution / production grade implementation would directly provide the co-ordinates. That would basically solve everything here.

Unfortunately, that ins’t the case.

i) It isn’t an academic or corporate issue as it’s from my very own project where i’m trying to experiment with a closed loop AI system.

ii) The bounding box is generated by the AI (object detection model), which I can’t modify to extract co-ordinate as it’s a proprietary system. I’m limited to only receive images from it with bounding boxes on detected objects (in this case, person).

iii) I’d of course love to discuss more regarding the approach. More data is shared below.

Progress so far:
i) From the image with bounding box, detected yellow color and created a mask
ii) Detected the edges on the mask
iii) After computing contours and polygon approximation, I’m able to extract the ROI from source image

This approach is hardly reliable as it isn’t consistent.
Here it works on maybe 1 or 2 of the test images I’ve shared in the link.

Limited to share only one embedded media here. Code for my current method and test data are in the below link.

Code for the current method and test data

details please. meaningful, substantial details.

the output pictures are one cropped screenshot of an OpenCV imshow window, and two random binary masks that are of little interest.

I see no reason to spend more time on this.

i) It is a proprietary software built in to a Hikvision NVR that uses object detection and outputs detected images. In the scenario, images of person detected.

ii) By details if you mean details of the object detection models used by Hikvision in their NVR, then there’s absolutely no information about it.

iii) I receive images from my NVR at home and i’m using OpenCV to extract the person from the received image through a conventional image processing approach which I’ve mentioned above.

iv) My ask is what other conventional image processing approach can be taken here apart from the one i’m trying ? Or if my general approach itself can be improved ?

v) If these aren’t the details you presumed, kindly mention what other “details” are required. Is it more details regarding the code? Approach? Overall flow?

Yes, they are just output images of the current approach i’m using. They are present if in case anyone wants to quickly glance through the method I’ve used.

I don’t see how that conflicts with what is being discussed as the data if someone wants to experiment on, is in the test_images directory.

The code and the data has been shared in the link on my previous reply.
test_images folder has the relevant data to experiment.

Kindly sate what more Info you’d want me to share as I can’t presume the exact
“meaningful details” that you may need, to have more clarity.