Lexing without Parsing

Ovid on 2005-08-29T18:28:12

I'm writing a new article and am going to figure out where to submit it. This one is about lexing without parsing (well, lexing without a grammar, to be accurate). Perl is great at munging text, but sometimes we have complicated text that we need to analyze, but we have no proper grammar for it. If the data is a line oriented logfile with a very predictable format, no big deal. However, if it's rather irregular and the regular expressions are getting too complicated, lexing the data into predictable tokens can make a hard problem very easy to manage.

A good example of this is parsing SQL. There's no complete SQL grammar written in Perl and the snippet I posted showed how to extract column aliases (SQL::Statement uses SQL::Parser and doesn't handle CASE statements, so it's not a solution. Jeff Zucker welcomes patches, though :)

Disclaimer: I'm not claiming that this technique (which I learned from HOP, I might add) is the best way of solving the "parsing SQL" problem. It's merely an illustration of the technique involved. A more complicated example involves transforming math expressions.

X \= 9 / (3 + (4+7) % ModValue) + 2 / (3+7).

I found myself needing to transform expressions like that into Prolog, respect precedence and allow parentheses to override precedence to become this:

ne(X, plus(div(9, mod(plus(3, plus(4, 7)), ModValue)), div(2, plus(3, 7)))).

Writing a simple lexer made it very easy to do, though I'll probably beef it up to allow constant folding.

ne(X, plus(div(9, mod(14, ModValue)), .2)).


Parsing SQL

djberg96 on 2005-08-30T01:18:43

Creating a grammar for ANSI SQL is possible I suppose, but vendor nuances would make it difficult to come up with an all encompassing solution.

I wonder if the major DB vendors have their respective grammars hidden in a vault somewhere, taunting us.

Re:Parsing SQL

Alias on 2005-08-30T02:09:34

In Oracle's case, they absolutely do.

I used to use the online docs with the "rail road" graphics in them (which are derived from the BNF) all the time.

Re:Parsing SQL

Ovid on 2005-08-30T19:34:51

That's why SQL::Parser allows you to name the SQL dialect you're working with.

As a general purpose solution, though, I've thought it would be nice to use the parser from SQLite. A tempting thought, but my C is probably not up to snuff.

Re:Parsing SQL

julz on 2005-12-06T21:01:05

Is there an Oracle dialect available somewhere for SQL::Parser?

I have a little "project" to see which Oracle columns and tables are being used in an internally developed report generator.

Re:Parsing SQL

Ovid on 2005-12-06T22:19:46

I'm not an expert with SQL::Parser. You'll have to ask jzed, the author. His contact info is in the docs. He's quite helpful.