For the last few months, I've been fading away a bit in the Perl/CPAN communities.
Certainly I haven't committed much of anything of note to my svn.ali.as repository in a while other than a doing a couple of releases for work mostly done by other people.
Instead, I've been participating in the orgy of hacking going on around Australian Government's new open data kick.
The main thrust of my work has been at the intersection of mapping and the structure of government, two subjects which I knew little about before the last few months.
With dozens of new mashup sites and many more experimental hacks appearing over the last 3 months, I've found that there is a clear capability gap in the govhacker community.
A number of good commercial tools exist for geo work, which the professionals use. A number of good free tools (Google Maps and friend) also exist for storing and displaying maps.
However, if you remove the desktop tools (which are rich but can't be merged into larger applications) and the Google Maps-like APIs (which are limited but can be merged into larger applications) you find a complete lack of anything high level which is both free to use AND can be merged in as a component of a larger system.
The result is that most govhackers are like Perl hackers without a CPAN, they are scrabbling around in the dirt doing tons of reimplementation and (usually bad) site-specific reimplementation of things that should really be standard tools.
Being someone that is mainly a toolsmith by nature, I've set out to fix what I see as the most pressing problem.
By mixing a ton of information from the census, electoral commission and various other databases, I (with my former business partner Jeffery Candiloro) have built http://geo2gov.com.au/.
This is a web service which takes a location description in a wide range of formats, and then identifies which federal, state and local governments (and which parts of those governments) are related to that location. It's built using Catalyst and the geo-magic is done using PostGIS with DBIx::Class sitting between the two.
To bridge the speed gap between traditional GIS (expensive but acceptable fast for a single user desktop application) and the interweb (cheap per-transaction costs, scalable infinitely) I've built a database schema which is VERY non-traditional, then indexed, clustered and analyzed the hell out of it. It's tuned very specifically for the read-only lookup task, and not intended for general storage and manipulation of the data.
As a result, I can identify the location of a lat/long in government terms at every level in less than half a wall-clock second (worst case), for anywhere GPS point or street address in Australia, including the various weird offshore islands and various territories.
A richer call which also maps the 15 Census codes used to identify a location runs in around 1.5 wall-clock seconds.
Thanks to the wonderfully complete DateTime library, I shortly also plan to implement all of Australia's 8 timezones in mapping terms (including the silly ones like the single town in Western Australia that decided it wanted a timezone all of it's own).
My previous work with the CPANDB and the CPAN Top 100 website have been an excellent warm up for this effort, as the aggregation processes involved turn out to be rather similar.
However, as I get 100% coverage over different concepts, I find the scope is gradually creeping outwards and this thing wants to become "The database that ate the Universe".
This wasn't a problem with CPANDB, because there's really only so much data you need to care about. But countries have FAR more data than the CPAN does. The SVN checkout for the entire geo2gov repository (which contains both all the code and cached copies of all our input data) now measures in at around 3-4gig and I have another 10gig of data that might potentially be useful to add.
The current dev version of the "product" schema (the compiled form that powers the actual service) is in the vicinity of half a gig. It contains every legal jurisdiction, every house of parliment, all the electorate divisions, all the members of parliment and the nested key structure for the entire census (plus mapping layers for all of the above).
I've also merged in a complete database of polling places (for "Where to Vote" type uses) and I'm seriously debating including an entire copy of the census at the lowest published level (just because I can).
That said, I'm reaching some of the same limits I did on CPANDB. For one, the main import process is now starting to consume more than the maximum available system memory on my machine. So many of the same tricks I had to do with Process.pm are starting to become needed for geo2gov as well.
Work continues...
P.S. Now that I think about it, "The web service that ate Australia" would make a great title for a talk...
What's the source for the X_Geocode data? The other sources I've seen for similar data in Australia have been expensive. Nothing in the list at data.australia.gov.au popped out at me as the source.
Re:X_Geocode data?
Alias on 2009-11-10T04:08:47
The geocoder I'm using is the Google one.
Re:X_Geocode data?
spooner on 2009-11-10T05:21:22
"You may use the HTTP geocoder to geocode addresses outside of your Google Maps API application so that they may be cached and later displayed using one of the Google Maps APIs, but locations obtained using the HTTP geocoder may not be used by any other application, distributed by other means, or resold."
http://code.google.com/apis/maps/faq.html#geocoder_exists
What about the Cloudmade/OpenStreetMap geocoder?Re:X_Geocode data?
Alias on 2009-11-10T07:17:35
For the moment I'm ignoring this problem as a legal technicality, as all current known users are themselves displaying these locations on a map as per the requirements.
I'm in reasonably regular contact with some people from the maps team (which is mostly run out of the Sydney office a few kilometres from my home). I'm quite sure issues can be resolved, and I won't be holding up development because of it.
Re:X_Geocode data?
tonyc on 2009-11-10T05:56:04
According to the documentation "geocoding results without displaying them on a map is prohibited".