Parse::CPAN::Meta 0.02 - YAML support in the Perl core

brian_d_foy on 2008-01-10T06:41:00

In order to complete phase 1 of the CPAN toolchain auto-upgrade work (which consists of changes to about 5-6 toolchain modules to support configure_requires:) we need the ability to parse META.yml to be available in the Perl core (so that we can guarantee auto-upgrade is working on a default install).

YAML::Tiny is the obvious choice for this, since the other options are a) Very large b) Require third-party C libraries.

But there's been a long-standing concern from the YAML faction (camp? clique? posse?) about YAML::Tiny not being a "real" YAML parser, and their concerns that if something is in the core with a "YAML" label on it, people will preferentially use it instead of a real YAML parser.

The compromise position that everyone seems happy with is to put YAML::Tiny (or enough of it to support parsing META.yml) into the core, but not to CALL it YAML, and to hide the package in a namespace that won't tempt people into using it outside of it's intended scope.

So I've create Parse::CPAN::Meta, which was built in the following manner.

1. Copy YAML::Tiny to Parse::CPAN::Meta and rename the namespaces.

2. Remove the YAML-writing half of the code, and just keep the parser.

3. Remove object-orientation, and just keep the Load and LoadFile functions.

4. Remove the global $errstr variable (to avoid potential threading issues) and return all errors via exceptions.

5. Remove YAML-comparison testing, YAML::Syck-comparison testing and round-trip testing from the test suite, but otherwise keep all the same test, sample files and regression tests.

6. Rewrite the documentation to reflect the new purpose of the code.

Somewhat ironically, this means that Parse::CPAN::Meta is significantly "tinier" and probably also faster than YAML::Tiny (but at the loss of the ability to write YAML, which I consider essential for a minimalist YAML implementation).

My intention is that this module becomes dual-lifed from birth, so that it can synchronize with any improvements or bug fixes in YAML::Tiny.

Any feedback is welcome.


Oh, here's one:

skazat on 2008-01-10T10:19:11

Less a critique than a question:

Isn't what you've done known as, "Cut and paste" Coding" -

Wouldn't it be easier to tweak Yaml::Tiny a small bit (and give back the changes) to Do What You Need, keep it separate and just use it like you'd use any other module?

I ask this, because the above is what I've been told to do, but much much (much) smarter people than myself. I'd probably include the OP in this group of people. One of the problems is the difficulty in having such a module live a, "Dual Life" - you're basically think that changes in one module will also be changed in another, somewhat similar module. In my experience, the modules will fall out of parity and will, basically get all wonky, in relation to each other.

This must have been a consideration - if not the first consideration you would have thought of - why the breaking of the rule?

I find this interesting, since it is going into core - you probably have a good reason - what is it?

Re:Oh, here’s one:

Aristotle on 2008-01-10T15:39:52

The reason is pretty much stated in his post: the name YAML::Tiny is controversial. A solution that relies on that module is therefore a non-starter.

Any time experienced people go against good practice, the reason is likely to be politics. Sometimes, technically unsound choices are the only alternative to complete standstill.

Re:Oh, here’s one:

skazat on 2008-01-10T20:54:40

That's too bad. Most good new ideas are controversial.

A benefit of using YAML::Tiny /would/ be that it would be in core. I know it's a difficult road to get modules in core anyways and that there's a push at least in Perl 6 to keep the number of core modules as small as possible, but in this project a module will have to be added anyways - why not a general purpose one?

Re:Oh, here’s one:

chromatic on 2008-01-10T22:58:21

It doesn't process YAML, for one, so the name's wrong.

Re:Oh, here’s one:

Alias on 2008-01-10T23:27:16

But then neither does YAML or YAML::Syck :)

Re:Oh, here’s one:

ChrisDolan on 2008-01-11T02:02:47

But then neither does YAML or YAML::Syck :)

This is fallacious.

  * YAML.pm is an incomplete YAML implementation (true)
  * YAML::Tiny is an incomplete YAML implementation (true)
  * YAML is a good enough implementation (largely accepted as true)
  * therefore, YAML::Tiny is a good enough implementation (not largely accepted as true)

Re:Oh, here’s one:

Alias on 2008-01-11T08:24:30

"Good enough" ?

YAML::Tiny isn't a YAML parser not because it doesn't meet an arbitrary line in the sand somewhere. It isn't a YAML parser because it is incomplete and not spec-compliant.

I can live with that.

But to excuse the other (incomplete and non-spec compliant) parsers as well because they INTEND to meet the spec and are somehow "good enough" stinks of hypocrisy and semantic games.

Re:Oh, here’s one:

Alias on 2008-01-10T23:45:28

> A benefit of using YAML::Tiny /would/ be that it would be in core.

Exactly!

We've seen this pattern before a number of times, so it's a fairly easy prediction to make.

We KNOW that people have a strong preference for using modules in the core, and the controversy in this case is that IF we added it to the core we know people are going to use it in preference to a complete YAML parser, which is almost certainly going to create major headaches down the road.

I'm quite certain that a number of people will STILL not use a real YAML parser and will showhorn Parse::CPAN::Meta into other places they want to parse simple YAML files.

But NOW they will be doing it counter to the published documentation, and if you void the warranty you have only yourself to blame when problems occur, and we certainly won't be helping people out of their mess.

Re:Oh, here's one:

Alias on 2008-01-10T23:39:10

Development of public APIs is never just about the functionality.

Things like module naming, method naming, the standard of documentation and so on have very real effects on how that code will be used in the real world.

Parse::CPAN::Meta may have a YAML parser, but because it can't serialize to YAML it removes a very large use case (file storage of simple data structures) from

> Wouldn't it be easier to tweak Yaml::Tiny a small bit (and give back the changes) to Do What You Need?

I suspect that the one key thing here you might not have noticed is that I wrote YAML::Tiny in the first place.

So there's less worry than there might typically be about cutting and pasting OTHER people's code, because I'm in a privileged position and can apply improvements to both modules simultaneously.

YAML::Tiny already Does What We Need... but it also does some other things we DON'T need, and by removing the extra stuff it both reduces the complexity and bug surface area, and prevents uses in cases other than the specific one we want.

Re:Oh, here's one:

skazat on 2008-01-11T02:53:40

I'll be interested in seeing how it all works and I'm sure I'll benefit from the added functionality :)

Don't like the name

ysth on 2008-01-13T08:50:51

Parse:: should be for general purpose parsers. Make it CPAN::ParseMeta.

Re:Don't like the name

Alias on 2008-01-14T02:48:36

It fits the precedent of previous similar modules.