New Features in HTML::TokeParser::Simple

Ovid on 2005-10-08T20:00:55

I just uploaded HTML::TokeParser::Simple 3.14 and it should be on a CPAN mirror soon. This is a moderately important upgrade, so it should have been updated more than just .01, but who can turn down PI?

Change log:

Added POD tests
Converted to Module::Build
All classes now state which methods they override
Carp is now only loaded on demand
peek() now allows you to peek at the next tokens

I particularly like the peek() method. This allows you to "peek" at where you are in the document without affecting the state of the parser. This is very helpful for debugging. I got the idea after suggesting it on Perl Monks.

The feature I tried to add, but failed, was optional string overloading. I wanted you to be able to do this:

my $parser = HTML::TokeParser::Simple->new(
  string   => $html,
  overload => 1,
);

while (my $token = $parser->get_token) {
  print $token;
}

That seems rather straightforward. I was "eval"ing overload but it failed miserably because bless $token, $class was triggering the stringification. I tried to munge the stringification method to handle this properly, but every time I did the code was getting uglier and uglier. Part of this deals with how the as_is method is overloaded. I also needed to ensure that I was removing overloading if a new parser was instantiated and that made it worse. I think I know what went wrong and I think I can fix it but for now it's a feature I've left out because no one has ever asked for it.