YAML::Tiny 0.01

Alias on 2006-04-24T03:24:45

I finally got around to reading the YAML specification.

OMG WTF BBQ!

It is unbelievably complex. The lower parts are a rat's nest of links to concept definitions, of which there are many. It requires a 3-phase event-based serialization and deserialisation pipeline.

It's like XML, plus 3 or 4 more extra resolution concepts to deal with human readability and choice.

This seems to result in page after page and multi-dozen EBNF statements just dealing with the issues of significant whitespace.

And for something which I'd originally imagined as just dealing with scalars, arrays and hashs, it contains gems like the following.

"A tag shorthand consists of a valid tag handle followed by a non-empty suffix. The tag handle must be associated with a prefix, either by default or by using a “TAG” directive. The resulting parsed tag is the concatenation of the prefix and the suffix, and must either begin with “!” (a local tag) or be a valid URI (a global tag). When the primary tag handle is used, the suffix must not contain any “!” character, as this would cause the tag shorthand to be interpreted as having a named tag handle. If the “!” character exists in the suffix of a tag using the primary tag handle, it must be escaped as “%21”, and the parser should expand this particular escape sequence before passing the tag to the application. This behavior is consistent with the URI character quoting rules (specifically, section 2.3 of RFC2396), and ensures the choice of tag handle remains a presentation detail and is not reflected in the serialization tree (and hence the representation graph). In particular, the tag handle may be discarded once parsing is completed."

In its fully defined form YAML is a very impressive piece of software engineering, and the flexibility and expressiveness is breath-taking. But my god is it a heavy protocol.

In any case, it certainly explain why YAML.pm chews up 4 megabytes (in the debugger at least) of RAM just to load a default dumper and loader and round-trip an undef to a string and back again.

I've got some things I'd like to use it for, mostly utility config file stuff. But even in this age of large available memory, 4 meg is very big for something that isn't the focus of the program and is just doing some little side job.

So as with Config::Tiny and CSS::Tiny, which both replace a heavy 3+ meg module, I'm going to have a shot at a YAML::Tiny, which just implements the things you'd find in a typical META.yml, in as little code as possible.

I'm not sure if it can even be done yet, and I might need to kill it and do a simple event-pipeline YAML::Lite instead, but it's worth a try first.

And if you have a login to my SVN repository, you're welcome to join in and help.

And in a late-breaking update, rumors come to me of someone else's bloat frustrations over Log4Perl stimulating an idea for a Log4Perl-compatible Log::Tiny module :)


typo in url

gizmo_mathboy on 2006-04-24T04:58:03

There is a typo in your URL's. They are missing a c in search.

Re:typo in url

Alias on 2006-04-24T05:59:08

Thanks.

One of these days someone is going to make it simple to link to search.cpan.org. Just wish that day would hurry up.

Re:typo in url

Aristotle on 2006-04-24T07:31:24

You haven’t fixed all of them yet…

YAML::Syck?

audreyt on 2006-04-24T07:10:55

My YAML::Syck only takes 271k... :-)

Re:YAML::Syck?

Alias on 2006-04-24T07:16:33

... and a C compiler :)

I'm terribly impressed with YAML::Syck, but I look at the entirity of the YAML spec and there's an enormous amount of, well, STUFF.

Does it handle all of the spec?

Re:YAML::Syck?

audreyt on 2006-04-24T07:29:03

Yes. Syck is the most complete implementation of YAML. :-)

Re:YAML::Syck?

Alias on 2006-04-25T11:52:01

Hmm... but how much of the spec is that?

I note for example that the documentation explicitly says it doesn't handle circular references. It there anything else like that?

Re:YAML::Syck?

audreyt on 2006-04-25T12:19:07

Where does it say that? That's already solved long ago. :-)

The only two unsupported features is "Force the Merge Key" and "Force Comments". See http://whytheluckystiff.net/syck/ for details. I've added Unicode support beyond what libsyck originally offers.

Re:YAML::Syck?

audreyt on 2006-04-25T13:07:52

Wow, thanks for the reminder; I have forgotten to push the new release (that handles circular refs and has been tested to be 5.005-compatible) to CPAN.

0.42 has been uploaded. Thanks for the reminder. :)

Familiar sentiments…

Aristotle on 2006-04-24T07:36:12

You read my CPANRatings review, didn’t you?

Re:Familiar sentiments…

Alias on 2006-04-24T13:34:39

I've wanted to do YAML::Tiny for a while, but I guess your comment just finally drove me over the edge to start it.

Module::Build

ChrisDolan on 2006-04-24T13:57:36

Be sure to check in with the Module::Build email list. Right now, they are discussing making a mini-YAML to bundle with M::B. Coordinating efforts would be valuable, I suspect.

compared to XML?

slanning on 2006-04-24T16:42:50

How does YAML compare to XML in terms of heaviness, then? I thought YAML was supposed to be a lightweight configuration language. If not, then I really don't see why people wouldn't just use XML or .ini-style config files.

Re:compared to XML?

Aristotle on 2006-04-24T23:59:06

YAML’s syntax is very lightweight, but extremely complex. If you only use the simple constructs, it ends up looking very clean and human-readable, but there are a lot of complex (ie. hard to explain to non-techies) constructs, and they all rely on funny (read: obfuscatory) punctuation.

XML has rather heavyweight syntax, but there are far fewer constructs than YAML has. It’s not really designed for rigidly and heavily structured things like data structures; it lends itself much better to “documents,” ie. things that consist mostly of text with markup mixed right in.

Re:compared to XML?

Alias on 2006-04-25T12:33:57

If I can use the machine-readable vs geek-readable vs human-readable differentation, then I can sum up the difference as the following.

Binary file formats are machine-readable, but NOT geek-readable or human-readable.

This makes them small and compact, but very hard for developers to handle and work with, because it requires you be highly intimate to do anything at all.

XML is machine-readable AND geek-readable, but not human-readable. The format is designed for machines, but are done in a human-enough way that it's easy for developers to understand them, and work with them, without needing to be entirely intimate with the format.

This STILL makes XML not human-readable, so in general people shouldn't be expected to ever have to read them or edit them by hand in the normal course of their work, they should work with XML files through a program, or view them via some other presentation medium.

Interestingly, this means the specification for XML itself (for machine usage) is relatively compact still, but can still be fairly heavy-weight compared to a trivially simple human-readable format like Windows .ini files.

YAML is intended to be machine-readable, and geek-readable, AND human-readable, AND still have all the features that XML does.

This means that from a human point of view, it's fairly easy to say

    ---
    - foo
    - bar

The downside of this approach is that the complexity FOR MACHINES is much MUCH higher. The specification is enormous in order to deal with the added flexibility and easy-of-use needed to make it more human.

This results in a format that LOOKS "lighter" to humans, but is actually "heavier" for machines.

As a result, XML makes a good choice for some sorts of files. If humans will not be looking at the files directly, and you need more features than you can get from a trivial format like Windows .ini, then XML is a great choice.

In particular, XML with the use of Schemas is absolutely ideal for its original purpose of inter-system document transfer. If you need your data to cross a trust-boundary, to consume data from some other untrusted entity, and so on, then XML is the perfect choice.

But it's not something you should tell humans to edit, and so XML is a bad choice for configuration files humans will touch.

YAML is very heavy for machines, but it is human readable, and so it is currently quite possibly the best choice for more-complex configuration files that humans will need to edit directly.

First Impressions

vjo on 2006-04-28T08:44:46

I have meaning to read YAML spec too. My first impression of YAML was that it was garbage.

When I go to yaml.org I see no rationale for why it was invented. Is it better than XML? A substitute?

I only see an example

    http://www.yaml.org/start.html

Which depending on taste is better or worse than XML. I think it is worse because of & | * >

Re:First Impressions

sigzero on 2006-04-28T18:26:12

One thing I do like YAML for is doing a Data::Phrasebook type config. I used to have my SQL queries in an XML file but you have to change the > and

Other than that...I would use XML.