So I went down to SF.pm meeting and gave two lightning talks about Web::Scraper and takesako-san's neat IMG tag hackery. These talks went well and other talks were interesting too. Photos uploaded to Flickr tagged sf.pm.
i am just going to do some scraping work and W::S works great so far. the doc is lacking though, the examples you posted in past journal helped! have few questions though:
process "h3.ens>a",
Parsing of undecoded UTF-8 will give garbage when decoding entities
result
keyword in the DSL do? i took it out of the DSL and it still works fine.Re:good stuff!
miyagawa on 2007-12-17T01:40:27
1. If you want a wildcard matching you can change the selector expression to something like ".ens>a"
2. Web::Scraper does whatever it can do to decode utf-8 characters back to Unicode as possible, as long as you pass the URI object and the HTML page has a correct Content-Type header. Otherwise you need to fetch the page into a variable and call Encode::decode to get the Unicode character back.
3. result keyword can specify which stash variable you want to get as a result. You can omit it if you want the entire hash.Re:good stuff!
Qiang on 2007-12-18T04:24:06
.ens>a
does that matching any class name contain the string 'ens'? what is the syntax for exact matching on a classname then?Re:good stuff!
miyagawa on 2007-12-18T04:28:03
No, ".ens>a" does exact match. Or in other words, exact match with class name. If you want to match partial class names, you might need to do a[@class=~"ens"] or something like that. Read CSS Selector spec for details.Re:good stuff!
miyagawa on 2007-12-18T04:32:35
Should be a[class~="ens"] that is.Re:good stuff!
Aristotle on 2007-12-18T10:01:39
No, actually, “
.ens > a
” matches an “a
” element inside an element of any name with class “ens
”, whereas “a[class~="ens"]
” wants to see the class on the “a
” element itself. The partial-match version would actually be “*[class~="ens"] > a
”.Re:good stuff!
miyagawa on 2007-12-18T18:08:49
Eh, i didn't look at the original question very well. The point he didn't get was class="foo bar" is foo + bar and not "foo bar". Anyway.Re:good stuff!
Qiang on 2007-12-18T04:50:27
er. my bad. i thought
great module, thanks!class="listing first"
is one class name. it is 'listing' and 'first'.