So I was sitting on my friend Andrea's sofa, talking to one of her other friends, a gentleman who also happened to be a programmer. Turns out he built systems in C. When he asked about what I did, I told him I built systems in Perl. He was confused. "The scripting language?" I replied that a lot of people think that, but it's not true and I build large systems in Perl, far larger than most people think. And his response? "Yeah, but Perl's a scripting language." He was very confused and when the conversation was over, I think he thought I was either lying or just very naïve. Maybe the latter's true, who knows?
You can debate all you want about what a scripting language is or isn't. I'll smile happily and start daydreaming because this discussion is even less useful than strong typing discussions. Instead, I started thinking about PHP. PHP 5 has been out for about 2 years but the majority of PHP code is still PHP 4. What's worse, given my experiences with Perl, I suspect that a lot of PHP 5 code is still being written to look like PHP 4. However, for those not familiar with PHP, I doubt they could even tell you what version it is. I doubt they could tell you what version Ruby or Python are up to.
Why is this relevant? Well, what do I think a scripting language is? Despite my earlier assertion that I'm not really going to pay attention to any debates on the topic because truth be told, they bore me to hell, the fact is that I still have a mental model of what a scripting language is. That's a language which is suitable for writing smaller "glue" bits of code, but not larger systems. Admittedly this is subjective, but a scripting language is to a "real" language as erotica is to pornography: I know it when I see it (further research is needed in this area).
This was an awfully circuituous way of getting around to my main point. By my personal, highly subjective and faith-based view of what a scripting language is, I would suggest that Perl 4 qualifies. In fact, there's still so much Perl 5 code being written in a Perl 4 style that this code is pretty much indistinguishable from a scripting language. I think many MySQL developers should be sympathetic because there's still a lot of crap being written for MySQL 5, despite the fact that it's becoming a serious database. So when people tell me that Perl is just a scripting language, I think I'm going to reply "you're talking about Perl 4, something that stop being maintained over a decade ago. Tell your programmers to stop writing Perl 4."
Why exactly is C supposed to be better than Perl for large systems (ignore speed of execution, assume that both can in principle satisfy your requirements)? What large-scale development advantages does it, as a language, have? What if you had raised an incredulous eyebrow and asked this gentleman, "you write large systems in *C*???" What would his response have been?
It's kind of the thing I (try to) have in mind when I hear the word "enterprise". When a system gets big enough and lasts long enough, the development "system" or "procedures" or "workflow" has to itself be robust enough to handle people coming and going to and from the project. I could be easily convinced that some languages have some advantages or disadvantages on the scale of a larger project, but I don't see specifically what C has on Perl as a language.
Define "system"
bart on 2006-08-08T11:11:11
I agree, writing large programs in pure C is nuts. Writing large systems in anything that doesn't have automatic garbage collection, is nuts. OTOH other people argue you should only use a strongly typed language for large systems. So, we can't all agree.:)
Despite the fact that he dislikes Perl, I largely agree with this guy. (I don't remember who pointed me to that article, it could easily even have been you (Ovid).)
Anyway, if you define "system" as something that is really big, then I strongly feel that C is one of the worst choices of a language to program in. If, OTOH, by "system" you mean device drivers and the like, stuff that talks to hardware, then I think C is a good choice.
It all depends on what you mean by "system".Jarballs better than CPAN? Haskell but no Pugs?
naughton on 2006-08-08T16:35:53
Despite the fact that he dislikes Perl, I largely agree with this guy.Thanks for the reference to that excellent article, (Scalable computer programming languages) by "this guy" (Mike Vanier).
Given that he dislikes Perl, I noticed a couple of glaring omissions:
CPAN doesn't even "come close" to Jarballs?
Java libraries are wonderful, but no mention of CPAN anywhere in the article.
Julian Morrison sent me this email:
There's a very important feature you missed, and it's the real explanation for the success of Java: separable, atomic, pre-packaged, versioned functionality. Jarballs. Those, more than anything else, make reusability real. Java programming is about plugging together ready-built parts. Nothing else comes close.
I have to agree with him on this, and it's a major omission from the discussion above (the component stuff is sort of related to this). I don't find Java to be a very inspiring language, but I like the Java infrastructure a lot (the same comment applies to C#). With Java, you can download packages and have pretty good confidence that everything will work as it should (with some caveats that I'll mention below). There are a bunch of features of the Java infrastructure that make this possible: bytecode, having the same virtual platform on every real platform, versioning, metadata, etc. but they all result in a chunk of code which (ideally) "just works". In fact, it doesn't always "just work" but it "just works" more often than in most other languages I've used.
Vanier likes Parrot and Haskell, is waiting to see on Perl6, but doesn't mention Pugs
Vanier on Perl and Parrot:
I am not a fan of Perl. Basically, I think that Perl is simply Python with an incredibly obfuscated syntax and a few extra features that nobody really needs. Perl is incredibly non-scalable; I dare you to try to understand any Perl program of more than a hundred lines or so. The fault is not just the syntax; the semantics of the language are full of little oddities (e.g. overloading on the return type of a function), and frankly, I recommend that you just stay away from Perl. Maybe Perl 6 will not be as painful; but then, maybe it won't. I'll check again when Perl 6 actually happens.
Parenthetically, one cool thing that has come out (actually, that is in the process of coming out) of the Perl 6 effort is an amazingly cool project called Parrot, which is a virtual machine targeted at dynamic languages. The goal is to have a common virtual machine for running Perl, Python, Ruby, Scheme etc. I like this project very much, so please check it out.
Vanier on Haskell:
Since writing the last epilogue, I've gotten very enamored of Haskell, which I've been interested in for a long time.Has anyone written to Vanier about these things? If so, what was his response, if any?
Re:scripting large systems
btilly on 2006-08-08T21:57:50
C is better for generating Heisenbugs. Which fact alone makes me strongly dislike it for large projects.
Heisnbugs are trivial bugs that show up in one part of your code, will move around as you recompile for different platforms, change unrelated code, etc, will often disappear for an extended time, and then will pop up when you least expect it.
The most common cause is a bad pointer, causing a fairly random point in memory to get overwritten. If that point has nothing particularly important, then everything seems to work. The code that is dangling the data seems to work as well. But if the point in memory is actually used for something important, then something else breaks. No, nothing that will help you track down the dangling pointer, just..something. And since trivial code changes (or compiling against different libraries, or on a different OS, or with different optimization conditions...) causes memory layout to change, the bug will apparently appear and disappear based on changes that have nothing to do with the real bug.
Yes, any decent programmer can avoid these in a small program. But the average in commercial software is one bug per thousand lines. If your project has over 100,000 lines, you really want your bugs to be localizable if you're going to track them down. But at least one common error in C is not localizable.
(Yes, I know about tools like Purify and so on. They are bandaids. Useful bandaids. Essential bandaids. But automated reasoning about code is still not as good as making the problems impossible at the outset.)
Well, what do I think a scripting language is? Despite my earlier assertion that I'm not really going to pay attention to any debates on the topic because truth be told, they bore me to hell, the fact is that I still have a mental model of what a scripting language is. That's a language which is suitable for writing smaller "glue" bits of code, but not larger systems.
We're re-watching Coupling season one at the moment (the real one, from the BBC), and all of this reminds me of Jane's pronouncement: "There's no such thing as homophobes. Only peoplephobes!" Debating whether or not "scripting languages" exist is about as silly.
The best description of what a scripting language is comes from Ousterhout, who coined the term. Why? Because coding comes down to two predominant modes: building modules, and connecting them. This may sound like a false dichotomy, but it turns out that it isn't. r0ml talked about this in his talk Failing to Succeed (slides). Like most of his talks, it was brilliant, wide ranging, entertaining, informative, involved his standard red clown nose prop, and impossible to summarize or remember fully.
One of the big points he made was in evaluating the cost of change in a codebase. There's a good paper he cites that analyzes the open source vs proprietary methodologies for software development (using the Linux kernel and the 1998 "Free the Lizard" Mozilla codebase as proxies), and manages to quantifiy the "modularity" of both codebases. As an extra experiment, the paper also analyzes the "modularity" of the Mozilla codebase before and after The Great Rewrite. (Which, in hindsight, disproves Joel's dictum to Never Rewrite Code.)
r0ml also made a few additional points, including the strong corollation between low modularity (tight cohesion) and high cost to change. The first way to reduce cost is to increase modularity (separate subsystems from each other). The next step is to make modular subsystems into standalone components (libxml, mysql, libpng, etc.), and glue them together into with some kind of "scripting language". Each component is general, well tested, and clearly defined. The sum of them as an application is similarly easy to understand.
But without discipline, application code can migrate away from components into glue code, so that most of the application is written as glue. That approach has different cost structure; moving application logic into glue code makes it *more* expensive to change. This could possibly be because each line of glue code is more expressive, so the cost to change is consequently multiplied. (At least that's how I remember r0ml describing it; I don't have the source handy.)
I don't think that r0ml's point is that glue code (scripting languages) are necessarily *bad*, just that it has a different impact on the project, and it's important to find the right balance among all of the competing factors. Gluing everything together in C is obviously bad, because without a constant effort to modularize, it leads to tight cohesion. Writing everything as glue is a similar problem, presumably because there's no clear distinction of what's a component and what's glue.
How should we interpret this with Perl? Well, it obviously has glue-like characteristics. Just look at any one page shell script. But it also has "component building nature" as well. The same relative high change cost probably still exists, if the barriers become fuzzy, and it's unclear where the modules end and where the glue begins.
Admittedly this is subjective, but a scripting language is to a "real" language as erotica is to pornography: I know it when I see it (further research is needed in this area).
Please let us know what you uncover.
Re:Scripting vs. Programming
Aristotle on 2006-08-20T07:49:45
Which, in hindsight, disproves Joel’s dictum to Never Rewrite Code.
No, it doesn’t. Joel wrote about this in one of his recent weblog posts, but I would have said the same thing even before he opined on this precise issue.
Joel’s dictum is not to forever keep the code you have – it’s to never throw it all away to start from scratch. Couldn’t Mozilla have improved their codebase one module at a time? Did they need to junk the rendering engine and the UI library and the networking code and the profile manager and everything else at once? Not likely. Sure, it would have been more frustrating, on grand average, to painstakingly comb the kinks out of the bad codebase, than just start over from scratch. But it would likely have taken no longer, and in the meantime there would always have been working code.