In October 2006, Damian Conway released an alpha draft of S26, the Perl 6 synopsis about Pod documentation. That email begins "Before Christmas, as promised!". The draft is large, but even for an alpha, it's a good read.
The release of the draft spurred a whole lot of discussion on the perl6-language mailing list. People seem to care a lot about these things. Late November the same year, Damian posted a version of S26 without the 'alpha' in its version number. He also promised to release a Pod-to-XHTML module. (He eventually did, in December 2007.)
The discussion following the release of these documents was intense and at times harsh. Damian eventually declared "I'm not going to argue about the design of Pod 6 any more." This was in June 2007. What was the debate about? Browsing through the many emails on the matter, I find it was mainly about the exact relation between Perl 6 code and its Pod comments. The vocal opposition, headed (in some sense) by Mark Overmeer, argued for greater connectivity between Perl 6 code and Pod comments. It's the old configurability-vs-defaults discussion again, with Damian representing the configurability camp and Mark the defaults camp. Damian challenged Mark to write his own S26, which Mark did. Damian wrote back that he was "thinking very carefully about what Mark suggested", and that he would "reply properly as soon as I am able."
That was in July 2007. Since then Damian has made a few cameos on the perl6-language mailing list, but on matters other than Perl 6 Pod. S26 remains in the state it was released in 2006, but nowadays with this message (here):
The information that is in this file is a draft specification that is known to be out of date and likely to undergo some substantial revision. Until the document is updated, look at STD.pm (http://svn.pugscode.org/pugs/src/perl6/STD.pm) for the valid POD syntax.
Beware of the leopard. (Update 2010-05-01: Time has passed, and since this post was written Damian has made one more sweeping change to S26, and then graciously declared it to be of the same status as the other synopses, i.e. changeable wiki-like by anyone who cares.)
Good thing about that reference to STD.pm as an up-to-date source of information. Unfortunately, when you read STD.pm, what you find is this, and only this:
# XXX We need to parse the pod eventually to support $= variables.
Essentially, S26 and Perl 6 Pod are in limbo since 2007, waiting for the return of Damian (or some other heroic soul) to the scene.
Against this backdrop, this happened in January 2008:
mberends: finally pushed my first draft of perldoc pod2text pod2man Pod::Parser
Pod::to::text Pod::to::man to http://github.com/eric256/perl6-examples/tree/master
mberends: please try to run it, there's more to follow (xhtml and a test suite)
masak: mberends: whoa! did you write that?
mberends: yes! over the last 3! months
You have to admire the courage of someone who implements a spec that's about to change, in Perl 6 whose spec is also in flux, on top of Rakudo, which is still being written.
I just asked mberends about a status report on his Pod suite. He said this:
mberends: masak: the low hanging fruit of common Pod is complete, including Pod5
upconverter. Format codes about 50% in text, man, xhtml, pod5 and pod6
emitters. =table and =use not even started.
What follows is my review of his Pod suite.
❦
The Pod suite can be found here. It consists, for my purposes, of four parts: a test suite, the module Pod::Parser, emitters to various serialization formats, and build scripts. I will review them in that order.
The tests
The tests are somewhat awkwardly situated inside a t directory
in lib/Pod in the repository perl6-examples. This is
somewhat of a deviation from common Perl practice, but the reason is most
likely that the perl6-example repository hosts many different things, and a
t directory outside of lib might simply get lost in
the general commotion.
The test files make use of Test::Differences, also found in
perl6-examples, which provides a method that extends the
diagnostics from the standard Test module to include "got" and
"expected" output.
All the test files begin with a class that mocks the class being tested in
that file by deriving it. That particular code is repetitive, with the
exception of the base class, and the name of the class itself, which
alternately gets the names Test::Parser and
TestParser. This would be an excellent place for a role, putting
all the mocking code in one module, and just mixing it in with a one-line
declaration.
We also find the same comment in all these derived classes:
# Possible Rakudo bug: calling a base class method ignores other # overrides in derived class, such as the above emit() redefine. # workaround: redundantly copy base class method here, fails too!
This, after trying various one-liners, I find not to be the case. Rakudo
works perfectly in this case. The comment is probably stale and should be
removed. (This, by the way, is why I've taken to write RAKUDO
comments which reference RT ticket numbers whenever possible. This makes it
easy to see whether the comment has grown stale, and whether the workaround
can be removed.)
The initialization of these classes is also a bit mysterious.
my Test::Parser $p .= new; $p.parse_file('/dev/null'); # warming up
Why do we parse '/dev/null' here? The comment appears to want to
explain, but only serves to heighten the mystery. Whence the need to warm up a
mock object? And must we really introduce a Unix shibboleth into an otherwise
platform-independent test file to do it?
Beyond that, everything else looks fairly good and normal. The tests
basically throw different inputs at the modules, and check that they get the
right output. Perhaps a module such as Test::InputOutput
could alleviate some of the repetition here, but not by much, and only at the
cost of another dependency.
I have the impression that the test coverage is actually pretty good; the 37 tests do get to exercise much of the code written so far. Keep up the good TDD, mberends++.
Pod::Parser
We now get to the centerpiece of the module suite, at this writing weighing
in at 33121 characters. The general structure is this: it starts with an
awe-inspiring grammar Pod6, followed by a few smaller grammars,
enums, and classes. After that, the class Pod::Parser takes up
the bulk of the file, followed (very appropriately) by excellent Pod
documentation.
mberends: I wish the rest of you would bother to write more comments and POD
The Pod6 grammar is exquisite. Care has been taken to align
regexes and their parts vertically so as to bring out structure. There are no
empty lines, but there are one-line comments grouping the regexes into logical
units.
Two comments about the class PodBlock, which seems to
effectively be a simple C-like struct (not that there's anything wrong with
that): first, since all of the attributes are marked is rw, it
would probably make sense to put the is rw declaration on the
class itself, as in this
example from S12. However, Rakudo doesn't yet
implement this, which might be considered a reasonable excuse.
Second, this:
my Str $typename = defined $.typename ?? $.typename !! 'undef'; my Str $style = defined $.style ?? $.style !! 'undef';
...should be written thus...
my Str $typename = $.typename // 'undef'; my Str $style = $.style // 'undef';
Now for the main class Pod::Parser. We are greeted by a longish
list of attributes, alphabetically sorted, each with a half-line comment after
it. Such details warm the heart of a reviewer, I can tell you. The one
comment I couldn't get my head around is this:
has $!context is Context; # not (yet) in Rakudo r35960
After thinking for a minute, I realize that Context is a
previously-declared enum, and that the comment probably refers to enums and
attributes not working properly, or some such. (Once again, a reference to an
RT ticket would have helped here.) Trying out enums in attributes in bleeding
Rakudo (commit 1dea76), I find that it works well, so the comment should go
away. Also, there's no need to put the type after the attribute, when all the
other types are written before. The line should look like this:
has Context $!context; # which state is the parser in?
On the bright side, here's a comment before another attribute:
# $*OUT broke in r35311, reported in RT#62540
Yes! That's what I'm talking about.
Here comes my favorite method:
method parse_line { # from parse_file
given $!line {
when // { self.parse_directive; } # '=xx :cc' /
when // { self.parse_extra; } # '= :cc' /
when // { self.parse_blank; } # '' or ' ' /
default { if @!podblocks { self.parse_content; } # 'xx' or ' xx'
else { self.ambient($!line); } # outside pod
}
}
}
See what happens here? Yes, all the logic, all the grunt work, is made by the
Pod6 grammar; the method basically lists the cases, and that's
it. It's obvious in retrospect, but I wouldn't really have thought of using
given/when in this case. There's even some room left to document what the
cases look like. Cute!
The method parse_directive seems to be a close relative of
parse_line. It would be interesting to investigate whether these
methods, and the methods they called, could be fit into an action class, and
called automatically through {*} directives in the grammar. Only
direct experimentation would answer that question.
An nice practice: all the methods called from parse_line and
parse_directive have a comment saying so. This gives some piece
of context, almost like declaring pre- and postconditions of a method. Since
these methods are only called from parse_line and
parse_directive, however, I wonder if they could not be made into
submethods instead.
In a cozy little method called buf_flush, we find this:
self.emit( ~ ( $!buf_out_line eq "\n" ?? "" !! $!buf_out_line ) ); # why is the ~ necessary?
Now, the attribute $!buf_out_line, in case you are wondering, is
declared as a Str, and is set to the empty string before every
parse. So the question about a stringifying ~ being necessary is
reasonable: why isn't the expression from the trinary always a
Str? Someone should remove that ~, find a case where
the trinary doesn't give a Str, put that case in quarantine and
have it sent to the good people over at RT for further classification.
The Pod documentation at the end of the file, as I wrote earlier, is
excellent. It has everything one would expect from the POD of a mature CPAN
module. I'm not just saying this because the Pod ends by praising me (and the
other November devs); it really does go an extra mile. It even contains a
section about how to build your own Pod serializer. It also quotes General
Helmuth von Moltke, and has expansive TODO and BUGS sections. The only
thing I find missing is that the METHODS section only lists the
most important methods, it doesn't describe them.
Among the BUGS, I find this gem:
Long or complex documents randomly suffer segmentation faults.
As a Rakudo-using Perl 6 developer, I can empathize with that. However, I believe that a few of those might actually have disappeared with the recent fixes from jonathan++.
The emitters
The emitters, or "Podlators", all subclass Pod::Parser. I get
really good vibes from Pod::to::text, Pod::to::pod5
and Pod::to::pod6. The pod6 emitter, however,
contains this:
$!needspace = $!needspace and (substr($content,0,1) ne " "); # $!needspace &&= (substr($content,0,1) ne " "); # crashes
Those parentheses are not necessary, and as far as I can tell, this statement no longer crashes.
Now, out of the four existing emitters, Pod::to::xhtml is the
most complicated one. That's perhaps not too surprising. I have two comments
on this code, one minor and one major: first, a few emit cases are commented
out with no reason given. The BUGS section seems to give
indications as to why, but the connection is not overly clear. This seems to
be a case where unit tests for the particular problems encountered would help.
Second, and more alarming, is the way the XML is serialized. Sure, the content is properly escaped, but the mechanism is still fully text-based, which can introduce subtle nesting bugs. It would perhaps be advisable to use a serialization technique similar to the one in the Perl 6 SVG module. For a quick introduction to the pitfalls of XML generation, see this article. Many of the tips therein could probably be directly applied to the XHTML emitter, making it more robust in the process.
The build scripts
mberends wrote his own Configure.pm in Perl 6. There are deep
coolness points in that; the rest of us are stuck with lame
Makefile.PL files written in Perl 5. The
Configure.pm solutions are a bit Unix-specific (for example a
qx workaround that presupposes /tmp), but there's
hardly any room to do better in Rakudo right now. Windows users haven't shown
up in great amounts to work on our Perl 6 projects anyway. (Funny, that. Are
people still using Windows these days?)
Here's a cutie:
# The opposite of slurp
sub squirt( Str $filename, Str $text ) {
my $handle = open( $filename, :w ); # should check for success
$handle.print: $text;
$handle.close;
}
Let me provide that check for success, free of charge:
my $handle = open( $filename, :w )
or die $!;
(A more interesting question might be what the sub should do if the file already exits. At present it simply overwrites files, which might or might not be the desired behaviour.)
The Makefile.in, finally, shouldn't be remarkable in any way,
but it is. It contains friendly instructions how to get started with
downloading, reading the documentation (using, of course, the Pod tools
themselves), how to contact the author, and how to contribute.
In conclusion
All in all, this is a set of modules that mberends has put a lot of effort into, and it shows. When we wrote November, we hoped that our code could serve as inspiration for others -- mberends has in many cases taken our ideas and run with them. His work on this suite of modules puts it on the second place in complexity (after November) among the Perl 6 projects out there. As someone who always delights in new Perl 6 code, I'm very happy about these modules.
The future of Perl 6 Pod is uncertain. By that I don't mean that I don't believe it has a future; it's just that it's been left in the limbo described in the beginning of the post. (Update 2010-05-01: Though not anymore, as also described above.) One day, Damian might return and completely Change Everything. We can only hope. If and when that happens, mberends++ has already given us a head start in parsing the Pod of tomorrow, by providing modules that do it today.
In the spirit of the Reviewer's Manifesto, I have contributed patches to most of the things I found worth changing during this review.