Interactive object detection in browser (NOT in real time)

Hello there,

I’ve just started learning about computer vision, so I’m also discovering the most commonly CV associated language, which is Python. I’m a Python total beginner (only learnt C and Pascal in my youth), so sorry in advance for the noobish question.
I’ve found countless tutorials about object detections in pictures, videos, and even real time streaming (using web workers) but there’s nothing about doing the job on static webpages.
What I’d like to do is: launch a Chrome/Firefox session, browse to a site, perform object detection on that website, and maybe interact with it. Typically, I’d go to an animal care website, detect a dog among the different animals pictures, then click on the first detected dog picture to go to that URL.
Now, from what I understood after looking at the code, cv2 needs a path to a picture first, using the imread method:
image = cv2.imread(imagepath)
The question is: is there a way or a trick to specify a part of the screen as the imread input, instead of a jpg file location? Then I could give the Chrome window coordinates as boundaries for the input.
I first thought about a workaround, which is: take a screenshot of my desktop, define the ROI = box within the browser borders, perform object detection, calculate the X,Y coordinates of the target, then go back to Chrome and send a mouse click on X, Y. But that’s a bit awful, and painful. I’d prefer to do that “live”, i.e. directly on my browser screen. Since the page is static once everything is loaded (no embedded video), I don’t need RT detection.
Thank you in advance.

if you really want to do this in the browser (and you don’t control any of those websites you visit), you probably need to do this from javascript (running inside the browser), not python, which would have to run e.g. on some server.

have a look at the opencv.js tutorials

oh, thank you very much. I got it

finding specific objects and clicking on them… are you trying to solve captchas?

if you use python, you can find some screenshot module and take a screenshot

(for windows, I’d recommend d3dshot · PyPI )

you have no reason to deal with the browser directly.

you can simply send a mouse click. it’ll land in whatever program happens to be under the pointer. there’s much written about how to simulate a mouse click using python.

Actually I’m currently looking for some bargains on vinted-like sites, especially used trench coat. As I’m quite lazy, I’m just trying to scrape to find a list of offers. Of course I could use the sort menu, it’s easier… but it’s also fun learning python/java while shopping :slight_smile: