Google's new API

Matts on 2002-04-15T20:13:21

So it seems google's new API has everyone going all gaga over it. Well I for one am sick of hearing about it.

You see before they decided to release this, they had a nice simple XML API. Instead of sending your query to http://google.com/search?q=foo, you sent it to http://google.com/xml?q=foo. Simple huh? The results you got back were always well formed XML. The syntax though was horrible, with single-letter tag names. I'm guessing the guy who designed it hated XML and thought it was overly verbose and wasted bandwidth.

Yet now we have an "API". Apparently having an API makes it much easier for everyone. I'm forced to agree with this because Google's original XML query output format was horribly terse to the point of being unusable. XML is supposed to be self documenting - if the tags had been named things like <results><result>..., it would have been dead easy to use. And guess what? You would have seen Java, Perl and other language APIs popping up left right and center to make use of these simple HTTP queries. Because this isn't rocket science. And it doesn't require large fat modules to build something around this.

But SOAP has "mindshare". And it's a "standard". So I'm obviously wrong, right? I do have a talk on this for OSCon, so I'd be very happy to solicit pre-conference feedback.


XML Documents vs. SOAP APIs

ziggy on 2002-04-15T20:33:02

There's one significant difference between Google's XML results and Google's SOAP API: the XML results gave you a document and left you on your own; the SOAP API gives you pre-digested data structures.

Giving everyone the same (poorly-tagged) XML document left them on a very low plateau: everyone who wanted to use the XML Search needed to write a parser for this document (or re-use someone else's). It's a nice solution, but only if you're not very lazy or just like XML. [*]

The SOAP solution on the other hand plugs into tools and APIs that translate RPC messages into your native object model: Java or .NET. (what do you mean there's more than just that? ;-) So the SOAP APIs do the same thing more verbosely with a different (and uglier) tagging convention, but they hit a higher plateau and facilitate laziness: you're going to use a SOAP library anyway, and you don't have to write/download/search for a specific Google-XML parsing toolkit.

This may not be a significant amount of effort to you or to me, but it's likely to be a significant amount of effort for someone. And that's likely to keep someone for paying for a 1M search/day license at some point down the road.

*: Yes, I'm necessarily discounting the hackers who are already screen-scraping Google's HTML output.

Re:XML Documents vs. SOAP APIs

Matts on 2002-04-15T21:37:49

So this is mostly because other languages XML parsers suck? If you stuck something containing <results><result... into XML::Simple, you'd get a simple array-ref. So this becomes as simple as XMLin("http://google.com/xml?q=foo") (assuming you've turned on XML::Parser's LWP support).

Re:XML Documents vs. SOAP APIs

ziggy on 2002-04-15T23:23:21

So this is mostly because other languages XML parsers suck?
I don't know if it's an issue of suckiness or not. XML::Simple or a tree-based parser still shows an XML-centric data structure. SOAP shows an application-centric data structure for any application using SOAP. While you have to ignore a lot of bogosity to get to the <result> element in a SOAP envelope, once your SOAP library has done that (on your behalf), you're left with something that (likely) maps very well to your problem space, rather than the XML technology space.

Then there's the issue of writing SAX event handlers, which is completely non-obvious when you've never had to write event-driven programs...

I'm sitting on the knife's edge when it comes to SOAP. It's fundementally an ugly yet trivial technique, but it's getting a lot of support. It also does have some non-obvious advantages like this. If you had told me three years ago that all of this wonderment about XML would reduce down to SOAP vs. No-SOAP, I'd have laughed uproariously. Now I grudgingly admit that there's a benefit to removing the abstract (http://www.google.com/xml?q=foo) and replacinging it with something concrete (the moral equivalent using SOAP).

Re:XML Documents vs. SOAP APIs

Matts on 2002-04-16T06:34:46

What has me wondering about SOAP, is why we didn't end up with CORBA over HTTP (rather than IIOP). I guess ultimately it was the pig-headedness of the OMG that slowed CORBA down - if they'd had some foresight into firewall smashing, and leveraging the web, they may have won out.

The sad fact is that SOAP is simply slow in Perl, so for performance SOAP applications you need to use another language. And that's when it stops being fun.

SOAP sucks

ask on 2002-04-16T00:30:09

This is a perfect application for doing it "the old way".

The old way we could have had caching and all sorts of neat things too for free. And they could have used standard http auth for authentication instead of having to reinvent that too.

Re:SOAP sucks

ziggy on 2002-04-16T00:40:48

But HTTP authentication is soooo second millenium. All the cool kids are doing neat-o kewl SOAP stuff now! And all of the k-rad SOAP APIs support the new widgets, not the krufty old HTTP auth.

:-)

Not XML, HTTP

darobin on 2002-04-16T00:52:13

I don't think that the guy that made the schema they were using hated XML. In fact, I'm sure it's the same person that made the SOAP API (at least, I wouldn't be surprised.

Thinking that XML is verbose is no reason to use unreadable element names. The reason the schema was designed that way was for the same reason form parameters are unreadable in most search engines: if you get several million requests a day, naming a form parameter q instead of query gains you some bandwidth. In fact, it gains you a lot of bandwidth.

The guy that made the XML version thought the same thing. He just didn't realize that HTTP has gz support and that it will amount to exactly the same size (approximatively) irrespective of the length of element names (see SVG size benchmarks for this -- it's truly amazing).

He hated HTTP, and thus he went towards one of its most perverted uses -- SOAP -- in pure delight.

Ok, I'm romanticizing there, and willingly ignoring a few factors. But I'm only half joking.

As for SOAP being definitely easier to handle for the zillion simple Java/.NET coders out there, I definitely have to agree with Adam. Any three of us or of many other hackers could build a GoogleResults object out of the XML version using SAX in less time than it takes to have this conversation here. But there's one thing you have to take into account: the world of Java and .NET hackers in general is no different from that of Perl hackers. There are some that know enough to create real applications or modules, but the vast majority are scripters.

What I mean by scripters (and I don't mean this in a bad way as scripts are useful) is people that don't create things that require a certain amount of design (where the certain amount may vary). They just import libraries, call a dozen methods, and are done. Whether it's OO or sh it's the exact same thing. Those people don't want to write that fifty line SAX handler you're thinking about. They want to get at the data and either call "data.getResults" or "echo $DATA | grep result" on it.

As tired as you as I am of hearing about the WebServices "revolution", that's what people want, that's what makes business tick, and that's what will take over if only because you need less skilled workers to achieve it. It's a common pattern in informatics that when someone says "revolution" they really mean "simplification" (for the users they have in mind).

There's nothing wrong with simplifying of course. It's just that someone's simplification is someone else's loss of flexibility or power.

Re:Not XML, HTTP

koschei on 2002-04-16T07:20:15

I think some Perl users wouldn't really care if they had SOAP or XML. All that matters is that they can:

use WWW::Search::Google::type;

my $results = WWW::Search::Google::type->new($key, $query);

etc.

where type is either SOAP or XML or HTML or whatever.

Heck, even with the type being transparent. You just specify that you want to use Google, and it picks the protocol it can use dependent upon what modules the user has installed.

So long as there's a nice package that someone has written and they all present a nice unified interface with all the tricky stuff hidden, most people probably wouldn't care about SOAP/XML/HTTP/HTML/whatever.

Re:Not XML, HTTP

Matts on 2002-04-16T09:13:09

I guess the difference is that Java and .NET people don't have CPAN. So they'd either expect Google to release a package for doing the queries - one each for Java and .NET, or they'll simply prefer the SOAP way. I think since Google would probably prefer not to develop APIs for different languages then SOAP is more appealing to them.

One thing that intruiges me though is the possibility to do server-side XSLT so that you can provide both a SOAP API and a plain XML API, from the same server using the same code, just with some XSLT to transform the input from SOAP to a querystring, and to transform the output from plain XML to a SOAP response. Hmm..

Re:Not XML, HTTP

koschei on 2002-04-16T09:55:33

How processing time efficient would that be compared to having two separate ones?

Just curious since I am sure Google like to keep things efficient processing-wise.

Re:Not XML, HTTP

Matts on 2002-04-16T11:03:25

I don't know. Ever heard of Moore's law ;-)

I doubt it would be terribly efficient. But it might be fun to find out.