Perl image recognition via regular expressions

Alias on 2007-07-24T19:56:07

If you are lucky (and probably Australian), you will have had the opportunity to see a hilarious lightning talk by Paul Fenwick (pjf) where demonstrates how he managed to use Win32::GuiTest to write a bot that plays minesweeper for him (and might I add, uses an undocumented easter egg to cheat at minesweeper, avoiding the np solver problem) :)

But I gather that some of the implementation is a little laborious, and does a few different evil things to get the job done.

The thing is, it jars me a little that a Win32-specific API is needed in order to make it work. I much prefer my solutions cross-platform.

So I've been pondering the idea of how to write a CROSS-PLATFORM bot that will play minesweeper for me.

Clearly, there's two main ingredients needed here.

1. A way to look at a computer screen and recognize what you are seeing.

2. A way to move the mouse around, click things and maybe type.

It shouldn't REALLY be necessary to inject events into the Windows GUI widgets.

So firstly, we need a cross platform way to look at a computer screen and recognize things. And as we all know, image recognition algorithms are one of these non-trivial things we keep reminding ourselves never to write.

But as I sometimes also need to keep reminding myself, Perl's regular expression engine is really really really fast and really really really GOOD at finding complex patterns in data (strings).

So why not teach the regular expression engine how to do image recognition! :)

First, we need a few things.

Thanks to the ongoing indefatigable work of Tony Cook, we have a great cross-platform image library, in the form of Imager.

And after a little bit of badgering from me, he was helpful enough to extend a branch of the Imager empire in the specific direction of Imager::Screenshot, which will let you take a screenshot on any platform as well.

With our picture of the desktop captured, we then just need a search image. Lets assume we can load these from a file.

So now all that's necessary is to express the screenshot as a single long string. I've taken the simple (but a little bloaty) approach for this first implementation of just converting each pixel to a HTML colour tag, like #003399.

I won't print out what this looks like, but you get the idea. One great big boring repetitive string.

Then we just need to convert our search image to a pattern that will be able to be found within this target image.

Something like this...

(?:#FFFFFF){13}.{259}(?:#FFFFFF){13}.{259}(?:#FFFFFF){9}#0026FF(?:#FFFFFF){3}.{259}(?:#FFFFFF){3}(?:#0026FF){7}(?:#FFFFFF){3}.{259}(?:#FFFFFF){3}#0026FF(?:#FF0000){2}(?:#FFFFFF){2}(?:#FFD800){2}(?:#FFFFFF){3}.{259}(?:#FFFFFF){3}#0026FF(?:#FF0000){3}(?:#FFD800){3}(?:#FFFFFF){3}.{259}(?:#FFFFFF){3}#0026FF(?:#FF0000){2}(?:#FFD800){2}#FFFFFF#FFD800(?:#FFFFFF){3}.{259}(?:#FFFFFF){3}#0026FF#FF0000#FFD800(?:#FF0000){2}#FFFFFF#FFD800(?:#FFFFFF){3}.{259}(?:#FFFFFF){3}#0026FF(?:#FF0000){3}(?:#FFFFFF){2}#FFD800(?:#FFFFFF){3}.{259}(?:#FFFFFF){3}#0026FF(?:#FFFFFF){5}#FFD800(?:#FFFFFF){3}.{259}(?:#FFFFFF){13}.{259}(?:#FFFFFF){13}.{259}(?:#FFFFFF){13}

Each row consists of 13 pixels in this case, and between each row we have a gap of 259 characters (37 pixels x 7 characters for my test images) which skips forward to the position that matches the first pixel of the next row.

And so now all we need to do is match the pattern to the target image string, collecting the match position, converting them BACK to x,y co-ordinates, and making sure the resulting match doesn't span across the end of the row (just in case).

Regex run-time for the search on my cheap-ass oldish laptop? 0.0377 seconds.

And there you have it.

Given an arbitrary search image in a few dozen lines of Perl (which will be available on CPAN as Imager::Search as soon as I write up the POD and some more tests) the regex engine can be used to take a simple search file and find the location on a the screen, regardless of operating system!

Want to write a mahjong bot? Just capture little search images of each tile, and Perl should easily be able to look at the mahjong board, see which tiles are exposed, match them up, and then click on them! :)

Now I just need to find a way to get an abstract API for pointing and clicking.


fun

markjugg on 2007-07-25T01:36:38

Fun!

VNC?

DAxelrod on 2007-07-26T00:23:33

CROSS-PLATFORM

1. A way to look at a computer screen and recognize what you are seeing.

2. A way to move the mouse around, click things and maybe type.
Sounds like VNC to me. Admittedly I don't know much about it at the protocol level.

Re:VNC?

DAxelrod on 2007-07-26T00:35:11

Hm.

Net::VNC exists but looks like it does display only; it doesn't look like there's a way to send input back to the server.

There's also something called LibVNCClient that includes a perl-based macro recorder called VisualNaCro, which might be worth poking around with.

resolution

slanning on 2007-07-26T08:55:32

What happens when you change screen resolution?

Re:resolution

Alias on 2007-07-27T01:40:23

You generate a new search regex...

I'm doing it every time anyway, since I haven't added caching.