I now have 198,586 cities in my database, complete with latitude and longitude for all of them. As a result, my pet project allows you to click on countries, see the regions for that country and then click on a region to get a paged city list (1576 cities just for New South Wales) and see a map of any city you click on.
The site still doesn't do anything amazing and it's not particularly user-friendly. That's because I'm discovering that managing data quality is really hard when working with only free data. Not quite sure why American Somoa has a region named "00", but there you go. In fact, I have 54 regions in the database named "00". Lots of slogging through here to understand things.
(Who the hell spends time coding on their vacation?)
Yeah, it's quite easy to find data problems there.
Germany > Berlin: the cities there are really boroughs, and most of them are duplicated (with and without the "Berlin-" prefix).
Inconsistent umlaut handling: Germany > Baden-Württemberg has a "u" instead of "ü", but I see umlauts in the "cities" under Berlin.
Croatia > Zagrebacka: this should not be Zagrebacka, but rather Zadarska or so.
Croatia > Grad Zagreb: the city of Zagreb appears twice, once under the correct name and once under the old German name "Agram" (which nobody uses anymore, not even the Germans). Sveta Nedelja appears in two spellings.
Maybe it would be better to use the data from OpenStreetMap?
Re:More data problems
Ovid on 2009-08-31T03:31:04
I'll look for the OpenStreetMap data and check the license. I've blown enough time looking at annoying data integrity issues that an alternative data source would be good.
Also, SQLite is fine for a simple database, but when you need serious reliability and are struggling with data integrity, it's painful to work with.
Re:More data problems
ank on 2009-08-31T16:36:52
Kudos for a great job!
Let me know if you think I can contribute somehow, I am originally from Argentina so I can check or add places in Argentina/South America from sources in Spanish/etc.
-- ank
Re:More data problems
Ovid on 2009-09-01T16:00:56
Thanks for the offer of help. Eventually I'll get to the point where I'll be needing it from folks who want to help entering legal immigration data per country.
To be honest, though, I might drop city/region support altogether. As I've discovered, most of the thorough information out there is geographic in nature (not surprising) but the information I need is political in nature. They don't quite fit together. If you want to emigrate to the US, do you really need to know where Paris, Texas is? Not really. The only reason I tried to add it is because my fiancée wanted to use the site to find Portland, Oregon and if it was easy, I thought I'd oblige her
:)