Hi, I’m working on a project, in which I would like to localise and classify icons on smarphone-screens (to press them with a stylus held by a robotic arm). The range of icons will be expanded over time. What approach should I take? I’ve been thinking about the following ones:
-train a network for semantic segmentation to localise icons on the screen (as they will be in certain positions - in a grid - but the position of certain icons can differ from phone to phone, as well as user to user)
-use one-shot learning to train siamise networks to classify app-icons (using some older or distorted versions of the original icon as an anchor and other icons as a negative)
-use some image processing to detect the icons on the screen - knowing they have square-shaped boundaries or some methods used in text detection (projection of the binarized image on the vertical and horizontal axes - detecting the areas in which the icons are located) - or other apriori information
-use one-shot learning on the icons as described above or train one simple CNN, which should be then retrained each time after addig a new icon.