Beautiful Freebase Metadata Dreams Slain

Ovid on 2009-08-11T21:27:46

So I've been hacking on a pet project and thought that Freebase would be my answer. As far as I can tell, it's not. Not even close. Right now, Freebase is like a huge Wikipedia, but with a nice query language on top. I needed a list of all countries in the world along with basic stats like capital, population, GDP, official language, etc. Here's the script I hacked together:

use strict;
use warnings;

use WWW::Metaweb;

my $mh = WWW::Metaweb->connect(
    server      => 'www.freebase.com',
    read_uri    => '/api/service/mqlread',
    trans_uri   => '/api/trans',
    pretty_json => 1
);

my $countries = '[{"type":"/location/country","name":null}]';
my $result    = $mh->read( $countries, 'perl' ) or die $WWW::Metaweb::errstr;
my @countries = sort map { $_->{name} } @$result;

# http://www.freebase.com/app/queryeditor
my %country_stats;

for my $country (@countries) {
    my $country_info = sprintf <<'    END' => $country;
    [{
      "type": "/location/country",
      "name": "%s",
      "capital":null,
      "currency_used": [],
      "form_of_government": [],
      "gdp_nominal" : [{"timestamp":null,"currency":null,"amount":null}],
      "gdp_nominal_per_capita" : [{"timestamp":null,"currency":null,"amount":null}],
      "/location/statistical_region/population" : [{"number":null,"timestamp":null}],
      "official_language":[{"name":null}]
    }]
    END
    print "Reading the data for $country\n";
    my $result = $mh->read( $country_info, 'perl' )
      or die $WWW::Metaweb::errstr;
    use Data::Dumper;
    $Data::Dumper::Indent   = 1;
    $Data::Dumper::Sortkeys = 1;
    print Dumper($result);
}

Not only do I get only 100 countries returned -- including the Weimar Republic and West Germany (but not East Germany) -- most of whom have almost no data associated with them. The ones which do have data often have curious results which might be correct (see the official languages), but without context, who knows? Oh, and WWW::Metaweb needs a monkey patch to get around an incompatible API change in JSON::XS. One suggestion on the Freeweb message boards involved posting back the correct information. This sounds reasonable, but at the end of the day, it also sounds like a lot of work, particularly since I didn't want to base my project on Freebase. I just saw it as a useful source of information. Freebase looks awesome, but it's not quite there yet. Or I don't understand it. Who knows?

I'll have to figure out a better way of extracting this information (CIA World Factbook sounds good), but then figuring out the posting API for Freebase just sounds like more work that will distract me from my main project.

Back to the drawing board.


Tips

Skud on 2009-08-11T23:09:12

1) If you want more than 100 results, up the limit to a higher number; 100 is the default. Use "limit" : 500 or whatever.

2) You probably want to mark some of your clauses as optional; as it stands, if the system doesn't know the capital of a country, it will be entirely excluded.

3) The Weimar Republic *is* a country -- or was. That's perfectly valid in Freebase. The country type is used for past and present countries and things that act like countries (eg. have an ISO code). Admittedly this might not quite be what you expect, at first glance.

One way to query for what you probably want is to query for things that are countries, which have an ISO code, and which are not also co-typed as administrative division. That should get you something approximating the list you expect -- if I'm guessing right about what you do expect :)

Also, re: updating data in the system, you don't need to use the write API. You can simply do it via http://freebase.com/ by clicking on "Edit" wherever necessary.

Re:Tips

Ovid on 2009-08-12T06:17:53

Thank you! I'll give it another go. Understanding the ins and outs of MQL is harder than I thought! :)

Re:Tips

Skud on 2009-08-12T06:27:48

Jump on IM or IRC (irc.freenode.net #freebase) if you'd like some realtime help. Remembering that I'm on the US West Coast, of course -- late in your day and early in mine would probably be our most likely crossover point. I'm skud11111 on AIM/YIM and kirrily.robert@gmail.com on GTalk.

Re:Tips

Alias on 2009-08-12T12:24:02

And on the API thing, would the most prominent Freebase Perl person be interested in taking over and fixing the Freebase module?

Re:Tips

Ovid on 2009-08-12T12:38:27

Which module is that? Kirrily already has the Metaweb module and there's a WWW::Metaweb module also. Regrettably, both fail to install due to incompatible API changes to JSON modules. I've a locally hacked copy of WWW::Metaweb (and I have filed a bug report), so it's easy to fix, but newcomers might be confused.

I'm not certain which of these modules would be of greater benefit, though.

Re: languages

Skud on 2009-08-11T23:17:10

Oh, btw, re: languages... the languages listed are taken from Wikipedia at http://en.wikipedia.org/wiki/index.html?curid=576 -- looking at the topic history ( http://www.freebase.com/history/view/en/austria ) or explore view ( http://www.freebase.com/tools/explore/en/austria ) should help you understand where certain information comes from. For the most part, our country data comes from Wikipedia infoboxes.