A developer release of Web::Scraper is pushed to CPAN, with "filters" support. Let me explain how this filters stuff is useful for a bit.
Since an early version, Web::Scraper has been having a callback mechanism which is pretty neat, so you can extract "data" out of HTML, not limited to the string.
For instance, if you have an HTML
you can get the DateTime object that the string represents, like:<span class=".entry-date">2007-10-04T01:09:44-0800</span>
process ".entry-date", "date" => sub {
DateTime::Format::W3CDTF->parse_string(shift->as_text);
};
and with 'filters' you can make this reusable and stackable, like:
and then:package Web::Scraper::Filter::W3CDTFDate;
use base qw( Web::Scraper::Filter );
use DateTime::Format::W3CDTF;
sub filter {
DateTime::Format::W3CDTF->parse_string($_[1]);
}
1;
If the .entry-date text contains errorneous spaces, you can do:process ".entry-date", date => [ 'TEXT', 'W3CDTFDate' ];
This explains how powerful this Web::Scraper filter mechanism could be. It's stackable, extensible, reusable (by making it a module) and also scriptable with inline callbacks.process ".entry-date", date => [ 'TEXT', sub { s/^ *| *$//g }, 'W3CDTFDate' ];
So Text::Filter::Common is a factory module and each text filter is a subclass of Text::Filter::Common::Base or something and implementsuse Text::Filter::Common;
my $filter = Text::Filter::Common->new($name, $config);
my $output = $filter->filter($input, $option);
filter
function that probably takes $self->config
to configure the filter object.Re:Text::Pipe?
miyagawa on 2007-10-04T09:02:55
I don't care much about names, but I disagree letting Text::Pipe itself have the stackable several filters becasue all filters have the same singlefilter
interface, you don't need to.
Creating a stacked pipe is easy by creating a new Pipe stacker object, like:And I also don't care much about the class structure as well, but it needs to be easy and less code enough for more developers to be able to write a new adapter for a new text filtering engine.use Text::Pipe::Stackable;
use Text::Pipe;
my $pipe1 = Text::Pipe->new('foo');
my $pipe2 = Text::Pipe->new('bar');
my $pipe3 = Text::Pipe->new('baz');
my $stacked_pipe = Text::Pipe::Stackable->new($pipe1, $pipe2, $pipe3);
my $output = $stacked_pipe->filter($input);
But well, it seems like a bike-shed discussion to me. The detailed API could be improved anytime once the development starts. The important thing is to know if it's a good thing or completely useless.
I'm also interested in writing a pipe for arbitrary data structure like reduce() or trim() that works on array ref. Go look at Test::Base::Filter module that INGY created a while ago. It has several filter function that operates both on string and array.Re:Text::Pipe?
hanekomu on 2007-10-04T09:13:35
Agreed re bike-shed discussion; one more point though:
my $stacked_pipe = Text::Pipe::Stackable->new($pipe1, $pipe2, $pipe3);
Yes, that's a better design pattern. In that case, Text::Pipe::Stackable->new() should be able to take both individual segments as well as Text::Pipe::Stackable objects as well (for a kind of recursive construction).
That is, stacked pipes should - to the user - be indistinguishable from individual pipe segments. It's just some black hole that has an input and an output.
Or, in the case of multiplexers, several outputs. Or with reductors, several inputs. Whatever.:) Re:Text::Pipe?
miyagawa on 2007-10-04T09:17:00
Yeah, it's caled Composite design pattern and also a Decorator.