(Parsing) math is hard

Ovid on 2005-06-19T00:25:25

Well, what a fine mess I've gotten myself into. In order to implement a new parser for AI::Prolog, I've managed to pull all of the parsing functions out of the various object and back into the AI::Prolog::Parser class (though I haven't uploaded it yet). This makes it much easier to see what the parser is doing and, when it comes time to put in a more sane implementation, I won't have to hunt all over the place for the parsing bits. However, before I uploaded it, I decided I wanted math to work correctly. Here's where the nasty bits come into play.

X is 7.
% same as:
is(X, 7).

Answer = A + B + C + D + E.
% same as:
eq(Answer, plus(A, plus(B, plus(C, plus(D, E))))).

7 >= 3 * 7 + 4.
% same as:
ge(7, plus(mult(3,7), 4).

7 >= 3 * (7 + 4).
% same as:
ge(7, mult(3, plus(7, 4)).

It's that last little snippet that's giving me fits. You see, I thought of this way clever idea to implement math: macros. When the parser first gets the Prolog program, it would see the formal math expression and rewrite it in the second form that the current parser recognizes. This would allow me to write a simple sub-parser for math and reuse the current prove RD parser to handle it. However, implementing macros means writing a regular expression that could tentatively math expressions with parentheses. It's trivial to match them without, but:

Answer is 9 / (3 + ((4+7) % ModValue)) + 2 / (3+7).

That's where things started breaking down. Getting a regex to reliably match that so I can hand that off to a subparser is tough. I was making the problem more difficult than I needed. I think it's time to try a different approach.

If you're curious, here's my test code for matching a math expression. It's $rhs that I need to fix up.

use Regexp::Common;

my $var       = qr/[[:upper:]][[:alnum:]_]*/;
my $num       = $RE{num}{real};
my $anon      = '_';
my $math_term = qr/(?:$num|$var|$anon)/;
my $ops       = qr{[-+*/%]};
my $compare   = qr/(?:=|is|[<>]=?)/;

my $rhs = qr/
    $math_term
    \s*
    (?:
        $ops
        \s*
        $math_term
    )*
/x;

my $expression = qr/
    ($math_term)
    \s+
    ($compare)
    \s+
    ($rhs)
    (?=[,.])
/x;

Update: OK, I can match all of those now. There are still some weak spots, but it's getting a lot better.