PPI::Tiny 0.01 Released!

chromatic on 2007-01-31T04:08:49

I admire PPI. It addresses a nearly intractable problem and solves it for almost every case admirably.

However, it's a really big distribution. It contains lots of modules, uses lots of memory, and can be slow. It does a lot.

Sometimes you don't need all of that overhead if you just want to parse a Perl document.

I've written a pure-Perl replacement that implements the barest minimum useful features in the hopes that it will be much more useful to people who want to parse Perl but can't afford the kilobytes of disk space it takes to do the job correctly. Until the CPAN mirrors sync, download PPI::Tiny from my own site.

Be sure to read the documentation carefully; I think I've listed every spot where it doesn't behave exactly like its heavyweight older brother accurately, but I might have missed a few. (In reality, there are only three or four, I believe. I might add those later.)

Enjoy!

Classic.

Matts on 2007-01-31T08:21:27

I think you get to be an honorary Brit now.

That thing...

Adrian on 2007-01-31T09:14:02

... that's like bronzy or steely... what's the word....

Re:That thing...

educated_foo on 2007-01-31T12:59:23
Humorless, perspectiveless pedant?

Re:That thing...

drhyde on 2007-02-07T09:49:46
You misspelt "childish".

now I get it...

domm on 2007-01-31T10:01:46

After reading this discussion I finally got the point...

:-)

What use cases does this implement?

Alias on 2007-01-31T10:04:17

All the other Tiny modules outline the cases that they CAN be used for.

For example, YAML::Tiny is intended to handle META.yml and typical simple config files.

Second, you fail the criteria for ::Tiny suffixes in that aren't backwards compatible. You fail again because you have dependencies.

Thirdly, why don't you take patches to add features?

Fourthly, it's not about the kilobytes of disk, it's about the multimegabytes of ram. People can and have used simple regex miniparser instead of the full PPI module, and correctly so.

Finally, I do get your point. I haven't actually tried XML::Tiny myself, only helped make sure he meets the criteria I set for ::Tiny modules. But I see patches coming in for XML::Tiny to add missing features, so who knows how complete it will get.

Re:What use cases does this implement?

chromatic on 2007-01-31T10:16:03

... the criteria I set for ::Tiny modules.

I consider the other half of the distribution name more important, actually.

Re:What use cases does this implement?

Alias on 2007-01-31T12:09:45
Fortunately, perfectly suitable modules already exist for people that care about the other half.

How unexpected.

jk2addict on 2007-01-31T13:52:23

After both threads, I feel like I'm in 3rd grade again.

If you don't like *::Tiny, don't use them. If you think they suck, submit a patch. If *::Tiny does what you need, without installed 42 other modules, (most of which in the XML namespace have various levels of compile headaches and lib versioning issues) then awesome.

Really, what's the big deal again? It feels like quibbling about calling a non-pedigree cat a feline just because it's not a purebred.

Re:How unexpected.

chromatic on 2007-01-31T19:12:29

Really, what's the big deal again?

CPAN names are rare. They're first-come, first-served. Thus it seems rude to me to claim a namespace for a distribution that doesn't actually do what the name claims.

There's tremendous potential for confusion among the vast majority of CPAN users who've never read use Perl. That seems, to me, highly unfair.

I'm also much less of a fan of bandying about strong assertions such as "Oh yeah, well no real user ever really uses this feature or that feature" without some sort of verifiable statistic.

If it takes uploading DBI::Tiny (a templating system), CGI::Tiny (a configuration file format parser), Net::Tiny (a program to build and install distributions), and (just for you) Handel::Tiny (an interpreter for ColorForth) to demonstrate to everyone's satisfaction that names matter, that the proliferation of slightly incompatible variants on similar themes may be confusing, and that there's a disturbing trend toward forking rather than fixing, then there's something wrong.

Re:How unexpected.

jk2addict on 2007-01-31T19:34:30

I don't necessarily disagree with your overall naming issues. But if DBI and XML namespaces are so precious, they should be locked down, and registered to the appropriate set of users. But since we don't register namespaces any more on CPAN, game on as it seems.

If it takes uploading DBI::Tiny (a templating system), CGI::Tiny (a configuration file format parser), Net::Tiny (a program to build and install distributions), and (just for you) Handel::Tiny (an interpreter for ColorForth) to demonstrate to everyone's satisfaction that names matter, that the proliferation of slightly incompatible variants on similar themes may be confusing

And there is the part I have real issue about. Why on earth would you upload a CGI::Tiny that doesn't have anything to do with CGI, or a DBI::Tiny that has nothing to do with DBI? That in itself, would be silly. XML::Tiny does have something to do with XML. It DOES try to work with XML, regardless of the quality of that action. XML::Tiny is not a network server, or a dbi layer, or an email pop checker in a costume. It is dealing with xml snippets and files. It is on topic.

Uploading a DBI::Tiny that is a templating system and not something that does DBI-like database work would be a bad, wrong, horrible thing; not because of the name, but because that it is not database related. Since XML::Tiny is about working with xml and xml snippets, spec compliance aside, imho the name is fine, and on topic.

Again, if we don't want people to do it, let's go back to registering namespaces on CPAN. If someone wants to make a DBI::Tiny, and it actual does a subset of the things DBI does, then I'm fine with it. If someone makes a DBI::Tiny that is really a mini image resizer, then we have problemns. I don't think we're there yet.

Re:How unexpected.

chromatic on 2007-01-31T22:34:34

XML::Tiny does have something to do with XML.

Perhaps it does in the sense that what it parses bears a superficial resemblance to XML, in the same way that what PPI::Tiny parses bears a very superficial resemblance to what PPI parses.

XML has a widely-distributed, well-understood, and well-implemented specification. It's easy to test a parser for compliance with the specification. If it fails, it's not an XML parser.

That was the point of specifying XML.

XML::Tiny deliberately does not implement the specification, yet it's in the XML:: namespace deliberately anyway.

I'm not worried about Matt or Robin or Michele or Sean finding the module and having trouble with it. They're very smart and experienced and can probably file a dozen tickets in RT just by reading the code once.

I worry a lot about non-experts who like the name, like the lack of dependencies and the ease of installation, and who'll unfortunately spend a lot of time debugging all of the code in their applications because their particular set of minimal XML is far different from David's set of minimal XML.

I agree with the goals--especially that you shouldn't have to be an XML expert to install and use the module--but I fear that you do have to be an XML expert to understand where the module just doesn't work. That seems to work against its goals.

Re:How unexpected.

Alias on 2007-02-01T03:03:46
> PPI::Tiny parses bears a very superficial resemblance to what PPI parses.

Hardly. It bears more of a resemblence to String.pm

NONE of the Tiny modules are in compliance with the standards they implement, and all of them make quite clear that they are for a very specific use case, and you should move to a real module as soon as it doesn't meet your needs.

And the argument of implementing something DIFFERENT, as opposed to implementing something incompletely, is a complete straw man argument.

Re:How unexpected.

vek on 2007-02-01T07:15:32
NONE of the Tiny modules are in compliance with the standards they implement, and all of them make quite clear that they are for a very specific use case, and you should move to a real module as soon as it doesn't meet your needs.

In hindsight the bloke probably shouldn't have claimed that he "decided to do XML right" in his original post though ;-)

Re:How unexpected.

Alias on 2007-02-01T07:39:37
I agree.

I've never had this sort of flamewar on any of the other Tiny modules, and they all have similar properties to this one.

I think the main problem here was the somewhat antagonistic attitude of the post.

Re:How unexpected.

vek on 2007-02-01T16:56:00
I think the main problem here was the somewhat antagonistic attitude of the post.

Exactly.

Re:How unexpected.

chromatic on 2007-02-01T07:23:58

It bears more of a resemblence to String.pm.

Without a specification, I can't see how anyone can decide what is and isn't PPI. I chose the example pretty carefully.

YAML::Tiny doesn't parse YAML. That is, I can give it a valid YAML document--perfectly valid, according to the YAML specification--and the module will fail to parse it appropriately. The documentation makes it very clear that you aren't interested in parsing YAML. That's fine, but it makes me wonder why YAML is in the name of the distribution.

XML::Tiny doesn't parse XML. I can give it a valid XML document--perfectly valid, according to the XML specifications--and the module will fail to parse it appropriately. Again, that's fine, but again I wonder why call it "XML"?

I make no judgment on the utility of these distributions or the motives of the people creating them. I'm sure they're useful within their proper contexts. Yet their names I fear will be confusing when reading code that uses them or when performing a CPAN search. I think they promise things that aren't true.

This is a problem, I fear, just waiting to bite non-experts--the kind of people who have trouble installing distributions with XS components or several dependencies.

That's one of the target audiences of the ::Tiny modules, isn't it?

Re:How unexpected.

Alias on 2007-02-01T07:48:02
Yes, PPI is a suitable grey case, because it's an incomplete but suitably useful implementation of a language which itself does not have a specification. In fact, as I've pointed out in talks, at the deeper levels of understanding "There is no Perl".

> YAML::Tiny doesn't parse YAML. That is, I can give it a valid YAML document--perfectly valid, according to the YAML specification--and the module will fail to parse it appropriately

That's not entirely correct.

If you give it a set of YAML documents, it will parse some of them appropriately and some it will refuse to parse at all.

YAML::Tiny comes with compatibility tests. In fact, it ONLY comes with compatibility tests.

For every single test document, if YAML.pm and/or YAML::Syck are installed, it will load the same test document using YAML::Tiny, YAML.pm and YAML::Syck and make sure that what YAML::Tiny parses the document into matches what the other two would.

In fact, in writing YAML::Tiny I've found incompatibilities in BOTH of the others, where they don't parse according to the specification either.

YAML::Tiny parser SOME YAML documents, just not ALL of them.

I think the big difference in the XML::Tiny case is, as you say, in the quality of the documentation.

I've always been careful to say which subset the module is aimed at, and to try to repeatedly point out over and over again that the module is ONLY useful for a subset of cases, and that people should be prepared to move up to "real" modules (and I generally recommend which) when they outgrow the limited cases in which the tiny modules are useful.

I don't really like the docs for XML::Tiny in that they talk about what the module DOESN'T do, rather than what it DOES do. I hope that will be fixed.

Re:How unexpected.

Alias on 2007-02-01T07:50:44
What I should also mention here is that I think the argument really comes down to a question of "If you don't implement the full specification of $foo, can you call something a $foo parser".

In the sense that YAML::Syck does not parse the entire YAML specification, do we call YAML::Syck "not a YAML parser"?

I think there's a place for non-speck parsers, with suitable caveats, when you don't have the resources to run a full parser.

Re:How unexpected.

Aristotle on 2007-02-02T01:45:31

It is fine to reject valid documents if they use format features you do not implement. It isn’t fine to just bumble on, returning junk to the API client. YAML::Tiny does the former. XML::Tiny does the latter. In fact, it will fail to reject a lot of malformed XML documents as well. And on top of it, it mangles the text content so that you cannot recover the original text unambiguously.

That is not an XML parser, it isn’t even a parser for a subset of XML. It is just a parser for stuff that happens to resemble XML.

Done properly, you can write a useful parser for a subset of XML. But you need at least a moderate familiarity with the intricacies of XML to know which parts warrant what kind of treatment, and the author’s attitude is apparent from the docs; after I read that screed, I was not surprised to find that instead of assuming there might be more to XML than met his eye he just plunged in and did the first thing he could think of.

Re:How unexpected.

drhyde on 2007-02-07T10:29:32
Whether ignoring things like processing instructions and entity declarations instead of dealing with them properly means that the results are "junk" is a matter of opinion. In my experience - experience in which I only use XML because it's what some other guy has provided - ignoring them has always been just fine.
And on top of it, it mangles the text content so that you cannot recover the original text unambiguously.
Aye, and the next release will fix that. Although to be honest I don't consider it to be a significant problem, as the only times you're liable to find character sequences like &nbsp; in real data (as opposed to deliberately pathological test cases) are in HTML and XML documents talking about the pitfalls of and how to encode such sequences of characters.

Re:How unexpected.

Aristotle on 2007-02-07T11:26:04

Things like “&amp;” are very common in XML on the web: that’s what ends up in the RSS feed when you type an ampersand into a weblog.

Re:How unexpected.

drhyde on 2007-02-07T15:19:27
When you get & through, you *know* that it was originally &amp;, because you know that I would have translated & :-)
Anyway, the shiny new no_entity_parsing and strict_entity_parsing options (on CPAN in the next few days) should fix that.

Re:How unexpected.

drhyde on 2007-02-07T10:17:54

I don't really like the docs for XML::Tiny in that they talk about what the module DOESN'T do, rather than what it DOES do. I hope that will be fixed.
That's certainly a change I'm prepared to make. I'll also keep a section on what it doesn't do

Re:How unexpected.

Maddingue on 2007-02-02T13:20:59

As one of the main issues for chromatic seems to be the names, maybe the ::Tiny modules should just be renamed with something like Pseudo in their name, for example Parse::PseudoXML, Parse::PseudoYAML, etc.

Even if you loose the cuteness of the "tiny" suffix, you gain more sensible and explicit names.

Re:How unexpected.

chromatic on 2007-02-02T19:19:32

I like that much better. My best other choice was ::Minimal.

Re:How unexpected.

drhyde on 2007-02-07T09:53:56
I'd love to see you do this.

Re:How unexpected.

chromatic on 2007-02-07T20:40:11

Actually uploading them would be a huge mistake. Perhaps under Acme::, but not in the near future.

Test::Spelling::Tiny

ChrisDolan on 2007-02-01T17:09:25

You misspelled your own name in the copyright statement at the end of lib/PPI/Tiny.pm. Maybe you need a t/developer/spelling.t?

Or maybe Test::Spelling::Tiny? :-)

package Test::Spelling::Tiny; use Test::More; use base 'Exporter'; our @EXPORT = qw(spelt_chromatic_ok); sub spelt_chromatic_ok { my ($str, $desc); like $str, qr/chromatic/, $desc; } 1;

Re:Test::Spelling::Tiny

chromatic on 2007-02-07T20:58:06

Bug reports to RT please!

(Yes, that's a joke and yes, I made a typo when using Module::Starter.)