Lex rethink this.

Ovid on 2005-04-26T00:28:48

Recently I wrote about some software I wrote that's a complicated lexer/parser all rolled into one. I mused about how this ultimately needed to be broken out into separate classes so lexing and parsing could be discreet behaviors. Unfortunately, we've discovered this has to happen now. This has to happen because we need to support this:

Lexer::Code   \
               \          / Parser::DB
Lexer::XML  --- Data Store
               /          \ Parser::LDAP
Lexer::String /

Every lexer will take its source and produce canonical lists of tokens that, regardless of source, will be identical. The parsers will grab the tokens and spit out an intermediate representation that's appropriate for the data store being used. It tremendously simplifies implementing these things, but the initial work of pulling apart the lexer and parser is a bear. If only Dominus' HOP had been published six months earlier ...