scraping sibling nodes by Web::Scraper.

tokuhirom on 2007-11-23T02:05:19

Web::Scraper is not good at some case. likes follow...

miyagawa
Web::Scraper
hanekomu
Dist-Joseki


This is not a tree structure.. hmm... Web::Scraper dependes on the tree structure, isn't it?

but, XPath is swiss army chainsaw.

scraper { process '//div[@class="author"]', 'modules[]', scraper { process '/.', 'author', 'TEXT'; process '/following-sibling::div[1][@class="module"]', 'title', 'TEXT'; } };

but, this code is doesn't works.scraper cannot support this way.

If Web::Scraper supports this feature, you can be scraping from 'search.cpan.org', 'blog.livedoor.com', or many web sites more easily.

follow is the dirty and quick patch for this problem. http://limilic.com/entry/c3qpikckc7f12jq3