Which is the best approach to build a dataset to detect object that in real situations are together?

Hi, i’m currently trying to detect papaya fruit before it’s harvested, to put an example, this is the kind of photos that i will get in real situations:

So, i need to detect at least all fruits in the photo, to do so i’m trying a machine learning approach training using a 300 dataset that i found.

this was used in order of classificate each fruit in 3 categories: unmature, partially and mature. The problem is that this dataset is compose of individual photos of the fruit so after training a cascade model it doesn’t detect any papaya of my stock images (not the one posted here). Do i need to expand my dataset of individual papaya photos or use pictures with groups of papayas?

It’s always better to work with photos that are similar of what you’ll use when building datasets. So for detecting fruits before harvest, it’s better to use photos of fruits on trees than photos of fruits in supermarket, or photos of individual fruits.

For detecting fruits on trees, I saw some articles that used Yolo V4 network. You can also define different classes for each ripeness category.

Thanks! I will look for those articles right now