Kontent Days 33-35: Parser

brentdax on 2005-08-06T20:02:36

The last few days have been devoted to writing the parser for Kolophon, Kontent's native markup language. This was quite a challenge, as Pugs's regex support is both spotty and buggy; I ended up with two hundred lines of given/when-driven code. But it works, parsing out all features, although it doesn't implement one particularly nasty one. (It also expects renderers to do a few things they don't do yet, but that's neither here nor there...)

Oh yeah, screenshots: Editing the code and viewing the result.

Kolophon is a simple markup language influenced mainly by Kwid and MediaWiki markup. Every syntactic construct involves at least two characters (or a character and a beginning-of-line assertion), and escaping is achieved by breaking up a sequence with a null sequence, \\. (For example, **insert text here** is the code for bold; if you want two asterisks right next to each other to appear in the document, you just put a pair of backslashes between them, like *\\*.)

Kolophon supports bold (**), italic (//), title/citation (__), strikethrough (==), superscript (^^), and subscript (,,) text formatting styles. It also supports a special code style (``) and an unformatted style (""). It supports a large range of character entities for various hard-to-type punctuation marks and symbols, as well as backslash and explicit spacing and newline. It supports a range of paragraph styles, including bulleted, numbered and definition lists; tables; blocks of code or quotations; four heading levels; and plain paragraphs.

Finally, it supports three kinds of linking: hyperlink ([[]]), transclusion (direct inclusion of another page's content; {{}}), and relation (a two-way relationship, for categories, alternate languages, and plain old "see also" sections). Of the three, only hyperlinks are supported by the HTML renderer, although transclusion won't be difficult, as I already support the concept of a subrequest.

So, next up is Class::Setting, WWW::Kontent::setting(), and other such nonsense...