Break image down to bottles only

Hello all.

I’m working on a project were I have a series of images of bar walls, and what I’m trying to do is identify the bottles. Initially I was thinking of using a semantic segmentation algorithm to get rid of all extraneous data, as if the bottles were buildings and I was trying to remove the sky + sidewalk + people. Do you think that’s a good approach?

Secondly, after I have rows of data that I know are just bottles, I was considering using YOLOv3 for classification / to train a model. This would be to get all of the bottles individually and try to identify what brand + product they are. Is this viable?

I’m not afraid of hard work, and know I’ve probably got a lot to do here (API to get different images, etc.) I’m just wondering if I’m heading the the right direction.

I’ve also done some object classification projects before, but here the object will be the same, but the image is different. I also thought about image-text dissemination, but the lighting could be terrible, so I thought whole image matching.

I’m not new to development, or Python, but I’m fairly new to ML, so thank you for any help.

Links. references to papers, all are welcome.


Example image (low-quality, but high quality will be an option also):

Yes, Yolo should be a good solution to detect and identify the bottles. You should only need a good training database.

As this is more a machine learning problem, you could start with choosing a DNN framework (like TensorFlow or PyTorch) and follow the tutorials for training a Yolo network and apply it to your images. Using OpenCV is optional for this project.