Syntactic Complexity and Implementation Time

chromatic on 2007-09-21T19:44:24

The complexity of a language’s syntax does matter. As Eric Raymond once wrote, “Robustness is the child of transparency and simplicity”. Do you for example think it’s a coincidence that Perl 6 is still not ready for production after six years of development?

jpersson commenting on a critique of JRuby versus CPython

Common Lisp barely has syntax. Where are all the reliable F/OSS cross-platform CL implementations then? (Hint: volunteer effort is cheap, not necessarily fast.)


Notation

djberg96 on 2007-09-22T03:22:33

Common Lisp barely has syntax.
But people hate the notation. Otherwise, it would be the one, true language. Which, it may yet be.

Re:Notation

chromatic on 2007-09-22T06:23:18

Do you think the ease of implementation depends on how much people like the notation more than the simplicity of the syntax? That's an intriguing thought.

Re:Notation

Alias on 2007-09-23T23:54:39

He may not, but I do.

The more people like the notation, the more people will be attracted to assisting with the implementation, which should greatly improve the ease of implementation (from the perspective of a project manager at least)...

The definition of "ease" is highly subjection and subject to misuse though.

"How long/easy is a piece of string"

False Analogy Alert!

ziggy on 2007-09-24T01:53:09

Common Lisp barely has syntax. Where are all the reliable F/OSS cross-platform CL implementations then?
chromatic, I understand your frustration with jpersson's comparison, but that does not make your false analogy correct.

If we start with the premise that "sharp objects are cheap and easy to fabricate", it's a huge leap to ask "where are all of the disposable samurai swords?" In fact, there are very many "disposable sharp objects", but we call them "razor blades", "box cutters" and "toothpicks".

The point here is that Lisp barely has any syntax. Common Lisp is a Lisp, but that doesn't mean that the 'barely any syntax' attribute necessarily applies to it. Common Lisp takes Lisp, and adds a lot of features, many of which layer on top of the trivial surface syntax. So, while it appears that Common Lisp programs are nothing more than well-crafted trees of cons cells, they are in fact much more syntactically rich. Consider what happens when you add vectors, hashes, macros, MOP, generic functions, and format strings. The only thing the surface syntax simplifies is the Common Lisp reader function, not the entirety of the implementation.

Nevertheless, you asked for a list of cross platform common lisp implementations, and wikipedia lists a few. Specifically, I draw your attention to SBCL, CMUCL, CLisp, GCL, ECL, OpenMCL, and Armed Bear Common Lisp. The number of open source implementations of Common Lisp is actually larger than the list of commercial implementations. We can quibble about their robustness, cross-platform-ness or whatnot, but the fact is that there are multiple open source, cross platform, somewhat interchangeable implementations to choose from.

Again, Common Lisp's simple surface syntax simply obscures the issue. No implementation of Common Lisp is ideal, which is why there are multiple implementations and no single canonical implementation. As I said, Common Lisp appears to have a simple syntax, but it doesn't, so it really shouldn't be under discussion anyway.

A better example for your critique would be Scheme, since there are multiple open source, robust, and cross platform implementations of Scheme. Interestingly, nearly all Scheme implementations are open source. Most are conformant to some community standard or another, usually R5RS, but also R4RS for some of the older projects.

Add all this up, and jpersson's comment does in fact hold up, even if it isn't supported by esr's snarky remark. Languages with simpler syntax are easier to implement, easier to re-implement, and easier to build to a robust state. This helps explain why there are multiple implementations of Common Lisp, Scheme, Tcl, Ruby, Python, Make, various XML parsers, XSLT, C and so on, many of which are robust enough for real world use. This also helps explain why there is effectively only one implementation of languages like Perl, TeX, and perhaps Ada -- these languages are so quirky that it's hard enough to build one complete, robust, open source implementation that no one really wants to build another. (Even pumpkings assert that Only Perl Can Parse Perl, with all seriousness.)

If Perl6 were less syntactically rich, there would be more implementations of it. Decades of industry experience back up this assertion, snarky quotes and false analogies aside. But that's Larry's decision to make, and he's comfortable with syntactic richness, and all of the tradeoffs it brings. And that should be enough for us.

Re:False Analogy Alert!

Aristotle on 2007-09-24T03:02:03

Consider what happens when you add vectors, hashes, macros, MOP, generic functions, and format strings. The only thing the surface syntax simplifies is the Common Lisp reader function, not the entirety of the implementation.

I think that was exactly chromatic’s point: whether or not the surface syntax is complex is not what makes the entirety of the implementation big and complex. Unlike the Schemes you mention, Perl 6 puts the entirety of the implementation (or nearly) under the syntax umbrella. This means there is no language core to implement sooner than the rest of the system.

So you are correct, but so is chromatic.

(Even pumpkings assert that Only Perl Can Parse Perl, with all seriousness.)

Sorry, that doesn’t support your argument. The fact that only perl can parse Perl is not because Perl’s syntax is so quirky that implementing a parser would be painfully difficult. It’s because you cannot parse Perl without executing it.

But not being able to statically reason about code without executing it is not at all exclusive to Perl. F.ex., IDE code completion is never going to be as robust for a language with open, runtime-mutable classes as it is for, say, Java. (Unless the IDE is part of the same VM as the code, and the code is compiled on the fly during editing. I hear there is a language where they do that sort of thing…)

Re:False Analogy Alert!

slanning on 2007-09-24T09:10:37

The fact that only perl can parse Perl is not because Perl’s syntax is so quirky that implementing a parser would be painfully difficult. It’s because you cannot parse Perl without executing it.

So only perl can execute Perl, then? Why is that?

Re:False Analogy Alert!

Aristotle on 2007-09-24T11:05:51

That is because the language is big and quirky in all sorts of ways, obvious and obscure alike – much more so than just the surface syntax itself.

But we were talking about surface syntax, so that is beside the point.

Re:False Analogy Alert!

chromatic on 2007-09-24T17:12:25

Languages with simpler syntax are easier to implement, easier to re-implement, and easier to build to a robust state.

Aristotle expanded on what I meant, but I still want to push the point of CL. What does it take to write a CL reasonably competitive with SBCL? You have to compete with its performance (which likely means targeting architectures directly), or portability, or library support, or getting at least one of those an order of magnitude righter than SBCL. None of those have anything to do with the syntax. They don't necessarily have much to do with the semantics, either. (Even if you could bootstrap a Lisp with support for cons, car, cdr, eval, and I believe cond, you won't get much performance that way.)

Perhaps my analogy was over-broad, but if I wanted to write a robust, cross-platform F/OSS application in CL, my options are few. If ease of writing a parser for a language were really significant to the difficulty of implementing the language, I would expect to have more options.

(There are probably more LOLCODE compilers than cross-platform F/OSS CL implementations, and LOLCODE has terrible syntax. Pun intended.)

Re:False Analogy Alert!

ziggy on 2007-09-24T17:53:07

What does it take to write a CL reasonably competitive with SBCL?

A whole heck of a lot. First, you need to be compliant with CLTL2, which clocks in at over 1,000 pages. Then, if you want to make an ANSI Common Lisp implementation, there's additional work to implement the diffs between the two specs.

By comparison, the original Lisp specification was 2 pages, the original Scheme specification wasn't much bigger. R4RS and R5RS both clock in at about 50 pages.

All of this gets back to the original point of contention:

Common Lisp barely has syntax.

That assertion is simply untrue. Common Lisp has plenty of syntax. It's just that its syntax is quite regular and appears simple on the surface.

If you were making a point that a language with a truly simple syntax should spawn a plethora of robust, open source implementations, then that assertion seems valid, even though you claim it isn't. Witness all of the robust open source Scheme implementations available today. If you want to take it back to the original vision of Lisp (car, cdr, cons and all that), then there are in fact thousands of these implementations produced every year. They just happen to be better known as "homework assignments" rather than "robust, production-grade open source systems". They tend to be written and forgotten (like many open source projects) because the language they implement simply isn't very interesting.

Perhaps my analogy was over-broad, but if I wanted to write a robust, cross-platform F/OSS application in CL, my options are few. If ease of writing a parser for a language were really significant to the difficulty of implementing the language, I would expect to have more options.

Now you're really conflating a whole mess of issues. :-)

Common Lisp is not a simple language, even though its roots are in Lisp. Expecting it to be simple because it is derived from a language with transparently simple syntax is like expecting you to reproduce by mitosis because, at some point in your past, your ancestors were single celled organisms. Although parts of do you reproduce by mitosis, as an organism, you yourself do not split into two every time you find yourself in a vat of growth media.

You are castigating Common Lisp for being complex, and not having the properties of a simple language. Common Lisp is complex. It is not a simple language. A simple language like plain old vanilla Lisp truly is simple, and has the kind of properties you assert Common Lisp lacks.

Re:False Analogy Alert!

chromatic on 2007-09-24T21:51:41

Common Lisp has plenty of syntax. It's just that its syntax is quite regular and appears simple on the surface.

Perhaps we mean different things by syntax. The way I understood the original comment, syntax means "Things you need to write a parser for." For CL, that's basically identifying applications, atoms, conses, and symbols, while providing access to the parser for macros and allowing (optionally) a couple of special forms (I think you can provide everything you need if only eval and cond are special forms, but I haven't tried to write a Lisp without eval as a special form). That makes for a reasonably simple parser. The code can get more complex if you rewrite the AST into a different execution form (or compile it instead of interpreting it), but parsing Lisp code is almost trivial.

If you mean something different by "syntax", then I see where we're talking past each other.

Re:False Analogy Alert!

ziggy on 2007-09-25T13:44:26

Perhaps we mean different things by syntax. The way I understood the original comment, syntax means "Things you need to write a parser for."

No, we mean the same thing. Lisp is all cons cells and lambdas. It's a small language, which is why it can be defined in one page of code. (I said "two pages of code" before, because Technology Review originally published an annotated version as a centerfold spread).

Common Lisp is not that language.

Common Lisp, for one example, supports macros. Macros are read by the reader function (the parser), and change its behavior. Common Lisp also supports modules, which can export macros, and also change the nature of the parser. Common Lisp also supports both dynamic and lexical binding (depending on the context and usage), which means that a local macro can override a previous definition of a macro of the same name in some situations.

Common Lisp also supports optional static type declarations, which, strictly speaking, a parser could ignore, but realistically speaking, the parser should understand to some degree. If nothing else, the presence of optional constructs like these only serves to complicate the parser.

Common Lisp also supports a rich set of data structures, like vectors, bit vectors and hashes:

;; Vectors
* (vector 1 2 3)

#(1 2 3)
* #(4 5 6)

#(4 5 6)

;; Bit Vectors
* #*1101

#*1101

Note that the reader syntax #(1 2 3) creates a vector and #*1101 creates a bit vector, both specific data structures that generally use an optimized representation and not stored as a simple set of atoms and cons cells. Note that these values are not atoms, strings or cons cells, and the reader is required to recognize #(...) as a vector and #*... as a bit string. I don't remember off the top of my head what other reader syntaxes a Common Lisp reader must support, but I seem to remember that the reader is required to allow hooks to recognize new user-defined constructs. So a reader isn't required to understand a construct like #regex{...}, but IIRC, it is required to allow you to plug in code to recognize that construct. Which, again, complicates the requirements of a Common Lisp parser.

Hash tables fall into a similar category, but you could argue that functions like (make-hash-table) doesn't require special knowledge in the parser, but you could also argue that an expression like (setf (gethash c 'color) red), can impact the parser, because setf requires a lookup to find the "setf function" to invoke to perform the update. A friendly Lisp parser would tell you when you're trying to invoke setf on a value known not to have a corresponding setf-function.

I'll grant you that forms like make-hash-table setf-functions don't necessarily require support from the parser, but reader syntax like #(1 2 3) and #*1101 do, and macros certainly do.

I think you can provide everything you need if only eval and cond are special forms, but I haven't tried to write a Lisp without eval as a special form. That makes for a reasonably simple parser.

See, this proves my point. You're conflating "Lisp" with "Common Lisp", and casting aspersions on Common Lisp for not being more like McCarthy's one page of code.

A simple Lisp, like Scheme, has a very small number of special forms. Scheme uses 13: if, cond, case, and, or, let, letrec, let*, quote, quasiquote and lambda. If your Scheme supports macros, there's also define-syntax, which supports the creation of more special forms. Many of those aren't strictly necessary, and could be reduced to about 4 or 5 if memory serves (lambda, cond, quote and let, but don't hold me to that list), and still be a substantially similar language.

As proof that Common Lisp is not a simple lisp, consider the 24 special forms that a Common Lisp implementation supports at a minimum. This ignores macros, additional reader syntax for things like vectors, and issues like support for setf-functions, among other things.

One of the nice things about Lisp, though, is that really heinous forms like loop don't need parser support. They complicate the language from a user's perspective, but not the perspective of the programmer writing the parser (just the perspective of the programmer writing the standard library functions). You're talking about the Lisp parser, so I won't go down that path. :-)

parsing Lisp code is almost trivial.

True, for certain definitions of "Lisp", that generally exclude Common Lisp. :-)