I've spent much of the morning doing something I've meant to do for a long time: completely rewriting the internals of HTML::TokeParser::Simple. One problem that's long bugged me is that returned tokens were all blessed into one subclass, even though they are clearly different types. The latest version finally rectifies that. Now extending this module to handle special needs should be a piece of cake.
One sign that the module is much cleaner is the lack of "if" statements. Most of them are in the POD, but I did notice a couple in my HTML::TokeParser::Simple::Token::Tag class after I uploaded it. As soon as I saw that, I realized that this class should actually be two classes -- one for end tags and one for start tags. It's interesting how the mere existence of a keyword points out a design problem. Start tags are what most people are really interested in, but overriding this class means overriding behavior of end tags. Silly me. I should fix that, too.
Re:HTML
Ovid on 2004-09-19T23:41:23
HTML::* needs all the smart help it can get.
And I try to help, too
:) And feeling rather silly about my failure to break start and end tags into their own classes, I went ahead and did it now and just uploaded it. I've made major changes, so I'm sure there are huge bugs, but I'm pleased at how easy the changes are now. That makes 3 releases of this module in two days. I should really be less impetuous.
Re:The 3rd generation
Ovid on 2004-09-21T13:55:47
The big version change was because of the new interface. While still backwards compatible, the new style constructors, the "get_foo" instead of "return_foo" names and a few other odds and ends are why I went with 3.0. From my standpoint, if I kept the interface the same and massively reworked the internals, there's really no justification for a version bump. Would anyone want MS Office 2005 if it had no new features and ran a touch slower?
:)
Digging into the problem, I tried a manual install, step by step, and I found that -Mblib adds the lib directories under blib to @INC. But the file HTML/TokeParse/Simple.pm wasn't under blib/lib, instead, it was under lib, a sibling directory. That directory is not added to @INC.
Digging into your test scripts, I find that you do:
actually duplicating the effort that blib does by itself. That was the reason for the tests to fail: I added lib to @INC myself, by changing this intochdir 't' if -d 't';
unshift @INC => '../blib/lib';
in all the scripts, and that made all the tests pass.chdir 't' if -d 't';
unshift @INC => '../blib/lib', '../lib';
I actually don't believe that lib directory should even be there. I think it, and all its contents, should be under blib.
But the most striking conclusion to me was the idea that your tests succeeded, simply because you were testing the previous, older install of HTML::TokeParser::Simple, not the new one, the one you were supposed to test...
p.s. Indeed, one of my PCs didn't have an older install.
Re:Something is wrong...
Ovid on 2004-09-25T23:46:35
I'm a bit confused as to why adding '../blib' to @INC would cause things to fail. After running perl Makefile.PL; make, the blib directory is built automatically. Did you skip that step and try to run the tests directly? That would cause things to fail since I added the wrong lib.
Adding '../blib' to @INC is a typo on my part as I generally intend to add '../lib' to @INC to allow me to modify the file directly and have the changes instantly picked up. Further, I can run the tests without even running make. Still, it's a nice catch on your part and I'll have a new version uploaded soon.
Re:Something is wrong...
bart on 2004-09-26T00:56:49
I don't know any more... I've tried to build it several times over, deleting the blib directory every time, and I don't get the same results all the time. Sometimes the whole of the lib directory is copied to under blib, but sometimes it isn't, and blib/lib/HTML/TokeParser ends up containing only one file: ".exists".So, what's up... No idea. I think that perhaps the whole make circus occasionally goes haywire. I'll try again later, I've now given up for the day.
Re:Something is wrong...
bart on 2004-09-30T19:16:22
Couldn't you find any excuse to bump it to version 3.14? That sounds like a nice, geekish version number to aim for...... I'll have a new version uploaded soon. :) Anyway, I have had the time to update a largish script of mine from HTML::TokeParser to HTML::TokeParser::Simple 3.13. I quite like it. If there's anything I miss, it's the option to extend
to$token->is_start_tag # is it a start tag
$token->is_start_tag($tag) # is it a start tag of type $tag (string)
$token->is_start_tag($qr_tags) # is it a start tag matching the regex $qr_tagsand similar for is_end_tag and is_tag. It'd make testing whether a tag is in a set of tags easier. Now I am using$token->is_start_tag(@tags) # is it a start tag matching any of @tags, provided @tags isn't emptywhich seems to be a bit of double work, to me.if($token->is_start_tag and $special{$token->get_tag}) {... The alternative is to generate a regexp out of the word list, which isn't too user friendly either.
Re:Something is wrong...
Ovid on 2004-09-30T19:48:11
That's an interesting idea. I wonder if I should create a new method to deal with this? I've already heavily overloaded this method and overloading methods is not Perl's strong suit
:( How about &is_tag_in_list and corresponding start and end method? The method name could be confusing, though: if ($token->is_start_tag_in_list) {...}That suggests that the token is a start tag when, in fact , it may not be. I guess the overloaded method would be better after all
:/ The above, incidentally, was a stream of consciousness that allowed me to figure out the interface. I didn't plant to write any of that, I just typed as I was thinking. I guess that's an example of how my mind (doesn't) work
:)