Ruby library scrAPI looks promising. It allows you to write scraper code using CSS selector, like:
scra per = Scraper.define do process 'span.title > a:first-child', :title => :text, :url => '@href' process 'ul.list-circle > li:first-child > a', :category => :text result :title, :url, :category end
html = open(url).read scraper.scrape(html)
Re:We could wirte a CSS - XPath transator
miyagawa on 2006-09-23T12:59:43
Yeah, that sounds about right. But does XPath support alternatives and sibling, like "h1 + h2" (h2 that follows immediately after h1)?Re:We could wirte a CSS - XPath transator
Aristotle on 2006-09-23T15:08:56
Yes.
h1/following-sibling::*[1]/self::h2Re:We could wirte a CSS - XPath transator
miyagawa on 2006-09-23T15:18:50
Neato. If Xpath can do all of what can be done with CSS2 selectors, translating (or compiling) the CSS selector exp to XPath is da way to go. The benefit of CSS selector is that it's much easier to write than XPath.
Googling "CSS selector to XPath" gives me pretty few results:
http://groups.google.com/group/behaviour/browse_thread/thread/246782199cea5ce9/a 2530a4abe5b12fd?lnk=gst&rnum=1#a2530a4abe5b12fd
http://www.joehewitt.com/blog/2006-03-20.phpRe:We could wirte a CSS - XPath transator
Aristotle on 2006-09-23T15:42:57
It should not be very hard. There are not many selectors in CSS2 and they just need to be translated once. Maybe I should write up the equivalents.
Re:We could wirte a CSS - XPath transator
Aristotle on 2006-09-24T01:07:35
Here you go: How to map CSS selectors to XPath queries.
Re:We could wirte a CSS - XPath transator
miyagawa on 2006-09-24T10:24:12
That's a great one. Thank you!
What about CSS 3 Selectors (Pseudo classes)? Looks like html/selector.rb implements some of those, e.g.:root, :empty, :only-child etc.
Re:Porting html/selector.rb to perl
bart on 2006-09-24T08:50:44
Can anybody recommend a quick tutorial to Ruby? I find this piece of source code extremely hard to read. That's because I'm not getting some of the basics in Ruby, of course.