EBNF grammar help needed

Ovid on 2006-06-30T14:39:38

Recently I was trying to write a grammar for TAP. With a grammar, I could write a parser and people could refer to a canonical grammar to know if their TAP implementation is correct. That grammar is incomplete and has bugs (for example, I didn't account for newlines), but it's a start.

I ran into several problems, most of which can be solved by folks coming to a concensus on how some things should be handled. However, there's one bit to the grammar which I don't know how to write. Consider this snippet of TAP:

ok 6 - ... or if it is zero
ok 7 - ... or not an integer
ok 8 - ... but a positive integer should not cause it to die

The relevant section of grammar might resemble this:

tests       ::= test {test};
test        ::= status (positiveNumber description) newLine
status      ::= 'not '? 'ok ';
description ::= (character - (digit '#')) {character - '#'};

See a problem with that? How do I use a grammar to indicate that each positiveNumber must be one greater than the previous positiveNumber and that said numbers must begin with the number 1 (one)?

(The title of this post should have been "EBNF Grammar Nazis wanted")

positiveNumber

ziggy on 2006-06-30T16:19:33

I don't think you can model a 'sequential nondecreasing positive number' with a pure EBNF grammar. The grammar is for constructing the parser, to determine if the input is syntactically correct. What you are asking for is a semantic check that is beyond the scope of the parser (at least before you start peppering the grammar with state variables and code).

Put differently, pure EBNF can't handle what you are asking, but lex can. :-)

Context-free grammar

iburrell on 2006-06-30T17:20:15

I wouldn't be surprised if the matching increasing sequence of numbers can't be calculated with a context-free grammar. EBNF (and most parsers) implement a context-free grammar.

Adding extra code to check the constraint makes the most sense to me.

What Ziggy said

pemungkah on 2006-06-30T19:49:40

The BNF exists only to tell you what the syntax is. The semantics determine what is actually done, given that the language presented is accurate.

So your BNF should specify a leading number, the dots, the comment, the newline. This allows a lexer to read the language and say "yes, that is a valid line of TAP" or "no, it's not valid because ...".

Take this example:

<value> := <value> + <term> | <term>

So we've defined that a value is one or more terms, separated by '+'. We have not defined what the operation does, but we have defined that
<term>
<term>+<term>
<term>+<term>+<term>+<term>

are all valid syntactically. It could be concatenation, or addition, or a series of subroutine calls, or whatever else we've decided '+' means.

so you have a few lines that specify that a test plan is <digits>..<digits><newline>. It's up to the semantics to determine that the first number should be 1, and that the last one should be > 1, and that at the end of the input, all of the numbers between 1 and the second one were found, in ascending order.

lex and yacc actually confuse the issue because they're not just parsers; they add in sematics too.

Not the right tool

rafael on 2006-07-01T10:21:11

As a grammar nazi (I wrote a number of complex grammars and even an LL(infinite) parser generator), I think that a context-free grammar is definitively not the Right Tool for this Job. (Likewise, it's not the right tool for parsing XML or YAML, for example.)