A couple of years ago, I first came up with the idea of doing basic image recognition using the Perl regular expression engine.
Using Imager for the underying cross-platform image handling, I got a basic first version working by converting each pixel to a HTML colour code, building the image string and a search regex, and then converting the character match positions back into pixel terms.
And it worked great, until I tried to scale it up to the type of image you might actually want to look at the most, screen shots. At the time I needed to monitor OS X machines for any unexpected popups, because these were advertising displays, and that sort of thing is embarrasing.
Once I tried to apply the search technique to something the size of a 1024x768 screenshot, the overheads of transforming a million pixels in Perl started to bite hard.
While the regexp search might take 0.01 second, building the search image was taking something like 10 seconds. Just barely acceptable in my passive monitoring case, but not good for much else (like, say, writing a solitaire bot).
While on the flight home from YAPC::EU (via Iceland) I finally spent some time reorganising the search code into a driver API, and trying what I think is a much better approach.
If the most expensive part of the job is converting the search image, why not use a NATIVE image format for which fast C encoding already exists, build it in memory and then regexp it there. We shift more work into generating the search image, and a lot more work in working out where the hell the match is in pixel terms, but for hopefully a large net win.
And it turns out that 24-bit Windows BMP files make for a reasonably decent search image format. Of course there's the small problem of the scanlines going from bottom to top, and the fact the red/green/blue bytes are around the other way, and the small matter of each scanline being aligned in dword terms, resulting in between 0 and 3 useless bytes at the end of each line depending on the width of the line, but it turns out that with some funky post-match math , judicious use of quotemeta, and some post-processing of matches to remove false positives, you can generate a search expression that will quite comfortably search for an image inside of a native Windows BMP file.
Whereas the old time to capture and prepare a 1024x768 search image as #RRGGBB was around 10-12 seconds, the time to capture and prepare a 1024x768 native BMP file is in the vicinity of 150 milliseconds, which is almost two orders of magnitude faster.
Combining a 0.15 second capture cost with a 0.01 second regexp cost and another 0.01 to generate the search expression, the result is that using the new default "BMP24" driver, Imager::Search can monitor the desktop for images at around 5 frames a second!
This has the makings of a truly awesome Solitaire bot, or maybe even something like pinball...
This has got me thinking though, could I make it even faster again?
I'm now pondering the idea of a platform-specific Windows or X11 driver that would bypass Imager for the capture step altogether, and instead generate a search expression for whatever the raw byte string is that comes out of the screen capture system call...
The other major addition I need to add is to support transparency (1-bit transparency for now anyways) in the search image.
But for now, I'm happy enough with the new fast Imager::Search version to stamp it as 1.00 and release it.
For a pet project* of mine, I needed to do some image processing/recognition from a video camera. First, I wrote the recognition algorithm with PDL, which turned out to be just a few, maybe five, lines of very dense code.
It turned out to be very difficult to use the resulting, transformed image in the way I wanted to, because passing it from the video4linux driver (JPEG output) to perl/PDL (binary PNM is the keyword there) to SDLPerl took *a lot* longer than the actual processing.
So I rewrote the whole thing in an ugly conglomerate of C&C++. Eventhough I tried to optimize it, the image processing part never was much faster than those five lines of PDL code. But the integration with SDL and friends was much more seamless. (There's actually documentation for SDL, woah!)
So maybe you'd be best off do parts of what you want in C/XS by hand.
Oh, and in terms of throughput, I'm currently limited by the 640x480 and 25fps the camera provides. My CPU usage on a 1.8GHz Core2 is hovering around 15%, but that includes decompressing the jpg, transforming it to pnm, switching the red and blue bytes, running the recognition on it, passing it to SDL (with a pointer, which was the core of the efficiency problem in Perl), rescaling it to the screen resolution, and displaying it.
Cheers,
Steffen
* I seem to have an infinite number of those.
Re:Related stuff
Alias on 2008-09-02T11:39:00
Given that I'm using a 1024x768 on 1.5Ghtz hardware, that suggests that the mechanism I'm using could just about handle your camera feed, albeit at around 100% of CPU. Granted I'm not displaying the results, but this is probably still only around an order of magnitude slower.
Since the regex itself only costs less than 0.01 seconds, that just leaves the overhead of the capture call itself (which could itself contain a Windows internal transformation) and the non-trivial cost of packing the bytestream into Imager::RawImage structures, and then sucking them back out again into BMP format.
I suspect that the stream->struct->stream cost is most likely the biggest removable cost, by a significant margin.
So once we've managed to remove that cost, maybe then we can see if there's a stray variable copy or something that is removable. Or perhaps by pulling the screenshot at 24-bit vs 32-bit vs whatever, we can somehow avoid a transform in the Windows layer and speed up that bit.
For now, I'm really happy that I can achieve these sorts of speeds in pure Perl (ignoring the work of Imager) because it means it's going to be a lot more flexible in the ways it can be applied. And many systems only really need one frame per second capturing, so we're talking 25% of a CPU, which isn't too intensive for many types of use cases.
If we could create an SV which pointed at memory not managed by perl, we could create one which pointed at Imager's internal image representation, and match against that. But I don't know of a way to do that.
Another option could be to use the getsamples() method to fetch each scanline of the image. This won't avoid the basic overhead of copying, but it will avoid the overhead of Imager's I/O abstraction.
A final option could be XS that either used i_gsamp() or the image's internal representation to perform the search, since Imager::Search::Driver::* only seem to use regexps to handle line skipping, there's no handling of partial matches, eg. using 0 in the alpha channel to indicate not checking against that pixel.
Of course, I suppose could get off my tail and do any of these...