Hello there,
I’ve just started learning about computer vision, so I’m also discovering the most commonly CV associated language, which is Python. I’m a Python total beginner (only learnt C and Pascal in my youth), so sorry in advance for the noobish question.
I’ve found countless tutorials about object detections in pictures, videos, and even real time streaming (using web workers) but there’s nothing about doing the job on static webpages.
What I’d like to do is: launch a Chrome/Firefox session, browse to a site, perform object detection on that website, and maybe interact with it. Typically, I’d go to an animal care website, detect a dog among the different animals pictures, then click on the first detected dog picture to go to that URL.
Now, from what I understood after looking at the code, cv2 needs a path to a picture first, using the imread method:
image = cv2.imread(imagepath)
The question is: is there a way or a trick to specify a part of the screen as the imread input, instead of a jpg file location? Then I could give the Chrome window coordinates as boundaries for the input.
I first thought about a workaround, which is: take a screenshot of my desktop, define the ROI = box within the browser borders, perform object detection, calculate the X,Y coordinates of the target, then go back to Chrome and send a mouse click on X, Y. But that’s a bit awful, and painful. I’d prefer to do that “live”, i.e. directly on my browser screen. Since the page is static once everything is loaded (no embedded video), I don’t need RT detection.
Thank you in advance.