Why META.yml is not (currently) YAML (or JSON)

Alias on 2009-03-12T00:44:54

The whole META.yml thing is really something of a cluster-fuck. In retrospect it was problem a bad idea to base our key piece of machine-to-machine metadata on a brand new format that was still in flux and didn't have a proper specification.

Problems still persist today. Most of them follow the theme of "In theory, theory is the same as practice, but in practice it isn't".

While in theory, META.yml is YAML, in practice, it isn't.

I've listed some of the issues below, in general moving from theory problems to practical problems.

1. YAML (still) doesn't exist yet.

The "specification" is still a pre-release. YAML is still effectively not "done", although the situation is far better now than in the past. Even as late as the 1.1 specification, the specification was based on an implementation (in the case of 1.1 I believe it was based on libyaml, or YAML::XS as we know it).

2. The META.yml specification specifically mandates incorrect (albeit accidentally valid) YAML.

The META.yml specification clearly specifies a YAML "header" in the form The first line of a META.yml file should be a valid YAML document header like "--- #YAML:1.0". The problem here is that isn't a "full" YAML header, it's just a document header with a comment. That YAML fragment hasn't been legal since way back when YAML was first invented.

According to the YAML specification, the above fragment should actually look like this. %YAML 1.1 --- We can also see that META.yml is based on the "1.0" (which, of course, was basically an "final draft alpha") specification of YAML, which is now deprecated to the point of being basically useless. YAML has been in this non-final final-draft phase for 5 years.

And of course, even the 1.0 specification of YAML didn't use #YAML:1.0, it specifies the use of "%YAML:1.0" (which is of course different to the current "final draft").

As an aside, it's this odd #YAML syntax that YAML::Syck "corrects" by silently and destructively modifying the original parse string. But that's another story (look for the nosyck test-skips in the YAML::Tiny test suite). I probably should just file a bug for that.

3. There are no YAML parsers

To my knowledge, nobody has managed to write a specification-compliant YAML parser yet. libyaml doesn't count, because the (1.1?) specification is based on it, rather than the other way around.

This isn't for lack of trying of course. Ingy alone has managed to release four different YAML parsers on the CPAN.

I've done one and a half myself (YAML::Tiny + Parse::CPAN::Meta) and I only was willing to have a shot at it by stripping out most of the junk to derive a "YAML Tiny" specification.

The one saving grace here is that all the YAML parsers have at least managed to maintain the same API for non-streaming reading and writing, so they are largely interchangeable.

4. There is no YAML support in the Perl core.

The configure_requires fix for the CPAN toolchain dependency defect requires a META.yml parser in the core to be comprehensively considered to be "complete".

This is the main strategic reason behind the creation of YAML::Tiny, to deal with small and straight forward YAML'ish files by defining a minimum usable subset and just parsing that, without buying into all the bigger formatting, specification, etc problems.

To my knowledge, none of the "YAML Parsers" meet the standards of the P5P team for inclusion in the Perl core, leaving the YAML::Tiny-based Parse::CPAN::Meta to fill the gap.

And of course, worse, once 5.10.1 is out and Parse::CPAN::Meta is in the core, it then becomes the "official" way to parse META.yml, which locks META.yml in to being based on the "YAML Tiny" specification, rather than the full YAML specification (whether the META.yml specification likes it or not).

5. YAML is (debatably) not a superset of JSON

I don't hold a position in this argument, but I have read a number of blog posts and the like from JSON/JavaScript people saying that YAML isn't actually really a superset of JSON, it's just kinda sorta a superset if you don't look at it too hard.

The main people that consider JSON to be a subset of YAML is the YAML team.

6. META.yml doesn't really support Unicode

I've been saying for years that the back-compatibility period we should be targeting is about 10 years. While nobody (including myself about half the time) wants to crystallise this as policy, it does more or less turn out to be correct. We recently dropped back-compatibility for 5.005, and from perlhist you can see the release date of the 5.005_03 release that was more or less the primary target of back-compatibility member of the 5.005 series. 5.005_03 1999-Mar-28

We now target 5.6.1 (or thereabouts) which was released around 8 years ago, and with 5.6 usage still at around 10% or so, I can imagine we'll hold at this level for another year or so before we move on to 5.8.1.

Since Unicode wasn't truly rock solid until 5.8.5 (2004-Jul-19), this still leaves us with a few years yet until Unicode is truly universal. Based on past trends, I'd expect to see Unicode become truly compulsory around 2011/12.

Until then, this leaves us with a problem in that META.yml files containing unicode will need to be readable by Perl versions (and toolchains) without proper Unicode support.

So what the hell do we do?

Personally, I'd like to see a two-phase approach.

In the first phase, we just accept that META.yml has issues and work around them. Clarify the META.yml specification to be based on the YAML Tiny subset of YAML, without support for Unicode characters (or at least discouraging them).

In the second phase, some time around Perl 5.10.3 or 5.12 or wherever, make Perl 5.8.5 the oldest supported version of Perl, and make a new META.json based on the actual JSON spec. Write a JSON::Tiny (or equivalent) and put it into the core.

Have PAUSE generate a synthetic META.json file based on META.yml for every existing distribution without support for it, and store it outside the tarball (in the same way we store README outside the tarball).

And, in the mean time, upgrade CPAN(PLUS).pm to support server-pushed minimum CPAN client checking. Which is to say, add support for the clients being able to the repository what the minimum-supported Perl/CPAN(PLUS) versions are, so that older clients know they need to upgrade not only the toolchains, but the clients themselves.


fighting words :-)

dagolden on 2009-03-12T01:10:26

Well, now we have something for people to debate over beers in Birmingham.

In seriousness -- since we're about at the point where toolchain hackers are going to unleash MYMETA.yml on the world, maybe we should take a collective deep breath and decide if that's in fact what we want.

There are benefits to re-using the META.yml spec and format. (In part, it makes MYMETA.yml easy to implement -- I added support for it to CPAN.pm in about 3 lines of code.) On the other hand, maybe it makes sense to implement MYMETA.* in a well-specified, well-implemented format as a stepping stone towards transitioning away from META.yml.

Where should people's marginal effort go?

-- dagolden

Re:fighting words :-)

Alias on 2009-03-12T03:24:18

> Where should people's marginal effort go?

To start with, it should go towards finding a consensus.

People doing things their own way, and have that ultimately become the default way (locking in any defects as 'official') is what gets us into half our messes in the first place.

Re:fighting words :-)

Alias on 2009-03-12T03:30:07

Clarification: I don't mean to suggest we should do design by committee, but that at the point when people's self-driven new ideas reach the point at which said person starts to suggest that everyone should adopt them, we put those ideas in front of the inquisition and pick out the bugs.

PAUSE switches to YAML::XS

LaPerla on 2009-03-12T08:02:42

PAUSE started to parse the META.yml files with YAML::XS today to get the ball rolling. The request for the move came from RJBS and I think it's a road worthwhile to explore. One should also link to his blog in this context.

XML

slanning on 2009-03-12T09:24:01

Why not use XML? I guess there's not enough glory: it already exists and works!

Re:XML

Alias on 2009-03-12T10:32:44

The key limiting factor is that whatever we use, we need support in the Perl core.

Re:XML

chromatic on 2009-03-12T19:03:05

No, we don't. We only need something capable of parsing these meta files before we install non-core modules.

All the core really needs is a way to bootstrap the process of installing other modules, distributions, or bundles.

Re:XML

Alias on 2009-03-13T00:23:44

This is a simple dependency chain.

1. All the core really needs is a way to bootstrap the process of installing other modules, distributions, or bundles.

Correct. Thus...

2. To bootstrap the installation of other modules, you need a new enough toolchain.

3. To be able to get a new enough toolchain, you need configure_requires (that was the entire point of having it)

4. To determine configure_requires, you need to be able to read META.yml

Therefore...

5. You need a way to read META.yml in the core.

The particular format for the pre-configure script metadata can be anything we like, as long as it isn't a security risk and won't suffer from the halting problem.

But you still need the parser in the core.

Re:XML

chromatic on 2009-03-13T01:48:56

You need a way to read META.yml in the core.

If that were true, no installation of a Perl version which lacks this reader in the core would be able to install Perl modules which require the reader. As that's obviously not the case....

Alternate solution: make the first thing a pristine, freshly configured, built-from source Perl distribution downloads when you want to install something a single bundle which contains just enough hand-selected code to allow upgrading the installation process. Do this through CPAN::FirstTime, perhaps.

Then you can drop the idea that you have to put more in the core to be able to remove things from the core.

Re:XML

Alias on 2009-03-13T10:54:39

If that were true, no installation of a Perl version which lacks this reader in the core would be able to install Perl modules which require the reader.

This is pretty much exactly what happens when you take a fresh Perl 5.8.8 install and try to install anything that needs Module::Build.

Re:XML

Alias on 2009-03-13T10:57:09

I should also note you COULD take the path you suggest.

But then none of those extra modules you need during FirstTime-time would themselves be allowed to use any new toolchain modules.

Which is technically a valid solution, but would require everyone involved in the toolchain to agree on a fixed back-compat set of core toolchain modules they would restrict themselves to.

The alternative (putting it in the core) allows all those modules to be upgraded as well and doesn't create a two-class system.

Re:XML

mpeters on 2009-03-12T11:45:19

Both JSON and XML are well established, fully specced and popular formats (XML more so, but there is a lot of JSON out there too). Picking one over the other should be based on a couple of key points where I think JSON far outshines XML:

1) Like Alias said, which ever we pick needs to make it's way into the Core. For XML if we we'd need to pick a subset since a full XML parser is big and complicated. JSON in comparison is easy and simple. A JSON parser would be small and stand a better chance of making it into the Core.

2) XML isn't really easy on the eyes and isn't a format that a lot of Perl programmers like. JSON looks very similar to the array/hash literal syntax that we are already used to and the simplicity is nice. No debating over whether something should be an attribute or a child element. It's just lists and hashes.