Break image down to bottles only

Hello all.

I’m working on a project were I have a series of images of bar walls, and what I’m trying to do is identify the bottles. Initially I was thinking of using a semantic segmentation algorithm to get rid of all extraneous data, as if the bottles were buildings and I was trying to remove the sky + sidewalk + people. Do you think that’s a good approach?

Secondly, after I have rows of data that I know are just bottles, I was considering using YOLOv3 for classification / to train a model. This would be to get all of the bottles individually and try to identify what brand + product they are. Is this viable?

I’m not afraid of hard work, and know I’ve probably got a lot to do here (API to get different images, etc.) I’m just wondering if I’m heading the the right direction.

I’ve also done some object classification projects before, but here the object will be the same, but the image is different. I also thought about image-text dissemination, but the lighting could be terrible, so I thought whole image matching.

I’m not new to development, or Python, but I’m fairly new to ML, so thank you for any help.

Links. references to papers, all are welcome.


Example image (low-quality, but high quality will be an option also):

1 Like

Yes, Yolo should be a good solution to detect and identify the bottles. You should only need a good training database.

As this is more a machine learning problem, you could start with choosing a DNN framework (like TensorFlow or PyTorch) and follow the tutorials for training a Yolo network and apply it to your images. Using OpenCV is optional for this project.

Thank you for the advice.

I’ve collected about 100k images of bottle for the most common spirits on sale at bars in this area. Most are manufacturer images so they’re good resolution.

Correct me if I’m wrong here, as this is my first real (non-tutorial) ML project, but I need to annotate all of the images as my training dataset, then use images of a whole bar as the testing dataset, then validate the results and feed those back into the models for correction.


Hmmm…I’m afraid I didn’t think through my advice. As a matter of fact using Yolo directly won’t work. It reduces the image to 400x400 pixels, which is too low to identify the bottles.

Maybe you can train a Yolo network to detect the images, then crop every detection and use a classifier network to identify the bottle.

Other thing: these networks “learn” how does a bottle look like. So it’s best to use the same kind of training data as the testing data, so it’s the best to use photos from pubs to create your training set (I know, it’s difficult) rather than nice photos from the manufacturer (which is not like your photos will look like).

instance segmentation, not just semantic segmentation.

any segmentation usually is fully convolutional.

I’ve heard that the ultralytics breed of yolos also eat arbitrarily large inputs for detection (with some granularity)