Silly Academia

luqui on 2005-09-30T01:53:31

While reading "Ordered Attribute Grammars, Uwe Kastens, 1980", I came across this description of one of the functions he was using in his example:

include(a,d): "a" is a set of descriptions, "d" is a description; the result is "a U {d}" if a does not contain a description "d'" for the same identifier described by "d", "{a\{d'}} U {d}" otherwise.

(I used U for union and \ for difference)

Stare at that for a little while. You can probably figure out what it's saying within a few minutes. But how about:

include(set,desc): "set" is a set of descriptions, "desc" is a description; if "set" already contains a description with "desc"'s identifier, it returns "set" with that description replaced by "desc". Otherwise it returns "set" with "desc" added.

If the author weren't so attached to keeping everything set-theoretic, you could say that a is a dictionary from names to descriptions, and make it even easier to read.

More recent papers have the same kind of purely mathematical language, formulating everything in terms of sets, using single uppercase letters for set names and single lowercase letters for variables. Anybody who programs that way is scolded by the community for producing unreadable and unmaintainable code. Yet academic papers, which have much more respect than programmers, continue to use this cryptic notation. These papers don't deserve respect.

It's because of this mathematical-notative culture that Haskell uses all the cutting-edge research and the dynamic languages are stuck re-inventing Smalltalk. It's not that the new research is so hard to implement, it's that it's documented incomprehensibly. PhDs in Haskell read papers and write the algorithms they describe just like everybody else, but they're trained in reading this dense academic language.

And of course, when I eventually do my dissertation, if I don't write this way my dissertation will be rejected and I won't get my degree. I won't think of the problem I solve purely in terms of sets and single-letter variables, I'll think of it in terms of computer science concepts, which have English names. I'll have to convolve my ideas into mathematical academic language, only to have other academics untangle them back into everyday computer science concepts as they read.

Computer science academics need to retake Freshman writing.

Much agreed

geoffrey on 2005-09-30T10:22:43

Totally right there. I just don't have the attention span to read that crap, so I tend to miss out.

I think part of the problem is that academic papers are not yet optimized for the networked way of doing things.

Aside from way too many being in PDF (*SIGH*), most assume you're coming to them from a background in the field, rather than that you just came from a Google search or a link posted on IRC.

"No, thank you, I don't want to have to spend 15 hours trying to read enough Wikipedia articles to understand the obtuse jargon and silly formulas strewn throughout. Yes, I'm sure it's nice and consise. Waste a few thousand bits on some extra words. They're cheap, you know."

mixing the code and the documentation

jmm on 2005-09-30T13:30:46

An acedemic paper is a curious mix of (in programming terms) code and documentation. It has to be a precise description that can be duplicated by others (i.e. code), but it also has to be understood intellectually by the reader (i.e. documentation). The set theoretic notation is compact and precise, working better as the code formulation; your "translation" works better as the documentation formulation. Best would be to have both, perhaps in a two column layout with the prose "documentation" in one column and the set theoretic "code" beside it in the other column.

Ideal point between "verbose" and "cryptic"?

naughton on 2005-09-30T17:31:05

More recent papers have the same kind of purely mathematical language, formulating everything in terms of sets, using single uppercase letters for set names and single lowercase letters for variables. Anybody who programs that way is scolded by the community for producing unreadable and unmaintainable code. Yet academic papers, which have much more respect than programmers, continue to use this cryptic notation.

Ironically, this reminds me of all those critics who dismiss Perl as "line noise." ;^) Many of us dismiss that criticism as knee-jerk and ignorant, but I think it brings up serious questions: For a given audience, is there an ideal point between verbosity and cryptic, completely symbolic notation? If so, how do we find that point? I'd love to know @Larry's answers to those questions for Perl6.

Here are some of my thoughts.

Possibly due to my humble BS in Mathematics, I've never been scared by Perl's sigils and heavy use of symbolic operators, and often prefer such syntax. I often ask, would you really rather have this

two plus two equals four

than this

2 + 2 = 4 ?

Sometimes I suspect the "line noise" critics secretly want to make all languages look like COBOL. We often complain about not wanting to type such verbose code, but I often don't want to read it, either. In the above case I actually find the symbolic notation easier to read and understand.

I first started thinking about this when comparing modern re-statements of Euclid's proof of the inifitude of primes, like this one

Assume there are a finite number, n, of primes, the largest being pn. Consider the number that is the product of these, plus one: N = p1...pn+1. By construction, N is not divisible by any of the pi. Hence it is either prime itself, or divisible by another prime greater than pn, contradicting the assumption. [1]

to Euclid's original, much more verbose version. What this example also shows is that sometimes symbolic notation may help us to understand and/or express ideas that would be extremely difficult to handle otherwise:

Euclid sometimes wrote his "proofs" in a style which would be unacceptable today--giving an example rather than handling the general case. It was clear he understood the general case, he just did not have the notation to express it. His proof of this theorem is one of those cases. [2]

OTOH, these statements provoke opposing thoughts:

It's because of this mathematical-notative culture that Haskell uses all the cutting-edge research and the dynamic languages are stuck re-inventing Smalltalk. It's not that the new research is so hard to implement, it's that it's documented incomprehensibly. PhDs in Haskell read papers and write the algorithms they describe just like everybody else, but they're trained in reading this dense academic language.

And of course, when I eventually do my dissertation, if I don't write this way my dissertation will be rejected and I won't get my degree.

Despite what the "experts" may say, I often doubt that all the abstractions, symbolic and otherwise, in certain fields serve any useful purpose. I suspect some of them serve only as a high barrier to entry, keeping the wealthy and powerful free to pursue leisurely lives of intellectual musings, safe from competition from the rabble outside their gated, master-planned communities.

1. http://www-users.cs.york.ac.uk/user/susan/cyc/p/primeprf.htm

2. http://primes.utm.edu/notes/proofs/infinite/euclids.html