And now for something completely Perlish....

ziggy on 2001-11-27T02:33:38

Screen scraping. The bane of web programming. Thankfully, its hideousness has spurred a bevy of alternatives to transmitting information over the web - XML-RPC, SOAP, RDDL, RDF, RSS, and various task-specific XML document types and other bizarre file formats.

I'm working on an article at the moment that deals with grabbing information out of an HTML document using XSLT of all things. It's convinced me that what's been lacking is a good way to concisely say I want this out of an HTML document, and do it reliably, interactively and robustly. Alternatives in Perl are a little more painful.

Watch this space for more details.


TIMTOWTDI

Matts on 2001-11-27T13:36:12

Don't forget things like XML::XPath and XML::LibXML can read HTML directly (easier with XML::LibXML than XML::XPath), which means you don't need XSLT, you can just use XPath, which is really damn cool, and a hell of a lot better than traversing a tree.

e.g. get a title from an HTML document:

my $doc = XML::LibXML->new->parse_html_file('foo.html' );
print $doc->findvalue('/html/head/title');

Simple eh?

Re:TIMTOWTDI

ziggy on 2001-11-27T14:01:55

That's step two. :-)

The idea here is to support quick turnaround prototyping with XSLT (presuming you know XSLT). Once that's done, copying the XPath expressions back to Perl is trivial (and leads back to single-language programming).