I’m trying to figure out if there is a way to determine collisions in a commercial open world video game with OpenCV. I’m totally open to alternative ideas.
Here’s some example footage of running around and bumping into stuff in Pokémon Legends: Arceus (less than 2 minutes long):
I initially messed around with using canny edge detection. But I realized I couldn’t quite figure out how to approach that idea, especially when (as you see in the video) you have situations like invisible barriers near the edge of the shore to a river.
One thought that came to mind was figuring out how to detect the direction the background is moving, and compare that to the player character. If the player is moving their feet and the background is stationary, they are stuck. Or for something like 0:09-0:12 in the video, where a boundary is hit at an angle and the player sort of “slides”, maybe the orientation of the player vs the movement direction could work?
Can anyone please point me in the direction of any techniques/algorithms/methods to look into? I’m really just not sure what I should be looking at using at all. Or if you have other ideas, feel free to share! I’m open to listening.
Edit: The background behind this is that in older main series Pokémon games, there used to be a “bump” sound whenever the player ran into an object or boundary. For some players, this was very important feedback, particularly for players who are visually impaired or blind. I was wondering if I could possibly build something that brings that sort of thing back.
just hook into the game process. that’s a debugging technique and requires operating system APIs.
using an AI to understand the game state from the video output requires serious knowledge of AI.
anything less than AI might not work anywhere near reliably.
yes, you can entirely forget Canny. nobody ever uses that for anything useful. it has specific narrow uses, and 50% of those are “here’s a tutorial that looks useful but isn’t, but I won’t tell you because I’m doing this for the ad revenue”.
that might give you some information. try it. go for “DIS” optical flow, implemented in OpenCV.
if the game makes a “bump” sound, you can use cheap AI to recognize that sound from the game’s sound output.
Unfortunately, I’m dealing with the Nintendo Switch here. It’s not a PC game. The best I can do is pull video from it if I want to stay within not-illegal territory.
Thanks, I’ll keep that in mind! I have a basic understanding of AI from a game programming class I took in university that was run by an AI professor. I always thought AI was interesting. Maybe this might push me down that path.
I’ve been working on a project for a limited portion of a different game (where you’re not running around the world) where the program has to figure out game state for which image matching or OCR code it should run (and then use TTS to relay the information). So if digging deeper into AI could help me improve that, too, then it’s definitely something I should look more into.
Haha, good to know!
OK, cool, I’ll check that out. Thanks!
Unfortunately, the whole thing is the bump sounds are gone in newer games. I’m interested in if it’d be reasonably possible to try to recreate something like it. It could help a lot of people who relied on the old bump sounds to allow them to play independently.
I am concerned about solutions not being reliable enough, being too slow, or requiring a lot of processing power. Too much of a delay would make it unuseful. And the more devices it can work on, the better.
optical flow is somewhat compute-intensive, AI for image/video understanding is more so. both can be hardware-accelerated. I don’t anticipate any serious latencies. one or two frames of history/state should be enough to detect the event, maybe a second at most (to notice stuck character despite moving feet).
the proposed system should incorporate the state of gamepad buttons. it only makes sense to assume a “stuck” event if the player actually tries to move in a direction the character doesn’t (can’t) move. in the absence of button state info, you would need to understand the player character’s animations.
perhaps something less binary would be useful? just turn the optical flow into sounds. use magnitude at least, flow direction maybe. perhaps generate spatial sound so the player can tell where the movement is or is not. then the player can tell any motion of the scene, no matter what the character is supposed to be doing/affecting.
you can try optical flow using the sample script opencv/samples/python/dis_opt_flow.py. it is CPU-intensive, so perhaps run it on a low resolution video file. 1080p is a bit much for my old computer.
there are probably ways to modify the hardware of the console so you can scrape this game state from RAM. it’s just a different degree of invasion, with the result being a more accessible game.
Nintendo endeavors to provide products and services that can be enjoyed by everyone. Our products offer a range of accessibility features, such as motion controls, a zoom feature, haptic and audio feedback, and other innovative gameplay options
so maybe it isn’t entirely futile to consider talking to Nintendo about ways to improve accessibility. the game not having a ‘bump’ sound certainly seems silly, or intentional.
Thanks! After messing around with the example, I think I have an idea. If I shrink the image down, that’ll make processing it faster. If the whole image has a low average magnitude (magnitude.mean()) but I can find that the character is moving, that should cover it about 95% of the time, I think. Just those instances when the player character disappears off camera would be where the idea would fail.
Figuring out if the player character is moving will be the tricky part since they are customizable. I will have to think on that one.
That said, I wouldn’t be the first to bring up these issues to Nintendo and the Pokémon family of companies. Pokémon games were accidentally accessible, it was never intentional, and Nintendo-published games are lousy at accessibility across the table. My voice and ability to speak is not special by any means, but I can code.
I’d like to start by saying that no solution will be perfect.
I think you’d be looking for certain features that are strongly associated with the movement.
When I saw the video, I noticed these waypoints that have a number showing the player’s distance to it. When the player gets stuck, they clearly don’t move, so you could try finding those.
A wild idea would be to split the image into 9 squares like a minecraft crafting table and detecting changes in each of the outer regions and use something like “if 3 out of the 8 regions think i’m not moving, then i’m not moving !”