Lex Looser

Ovid on 2005-04-04T18:13:07

I'm writing code that allows someone to search a data store with stuff like this:

my $iterator = $store->search($class,
  desc => ANY(@possibles),
  OR(name => NOT EQ 'Ovid'), 
  {order_by => 'date'}
);

That converts to something like:

WHERE desc IN (?,?,?)
   OR name != ?

In other words, Theory designed a mini-language and I implemented it. Unfortunately, having never done stuff like this before, I came up with a clean, well-factored bizarre twisted cousin to lexers and parsers. Now that I'm reading MJD's "Higher Order Perl" (buy it, now!), I see the error of my ways. My lexing and parsing is all bound up tightly together and anyone who wants to implement a different data store will be forced to replicate much of my logic. Ugh.

OR, ANY, NOT, EQ, and many other tokens are merely subroutines that are exported (upon request) into the user's namespace. Those subroutines know how to identify themselves and return their arguments. So "NOT EQ 'Ovid'" is roughly equivalent to:

sub { 'NOT', sub { 'EQ', shift } }

At any given time, I don't know how many levels of subs there may be (0, 1, or 2), nor how many "string" arguments to search for. My code jumps through a lot of hoops to untangle that directly into an SQL statement. It's clean and well-documented, but stupid. What I should have done is have the subrefs know how to navigate themselves so I could do this:

name => NOT EQ 'Ovid'
# is seen as
name => sub {...}

And then my actual parsing code might see something like this:

while (my ($attr, $value) = each %search) {
  $value = $value->(); # becomes [ 'NOT EQ', 'Ovid' ]
  ...
}

In other words, the mini-language would know how to lex itself and anyone needing to create a new data store would merely have to write a parser. They would have a canonical list of tokens ready for them. 20/20 hindsight, eh? This has now gone down on our list of "things we'd really, really like to do in the future."