Toyota Corollas, Toyota Supras, and Databases

Ovid on 2007-04-10T21:40:43

Frog asked why I'd be more picky about my choice of database than my choice of programming language (well, the question was implied). For regular readers of my journal, you might have a good idea why I think this, but for the casual reader, it's less clear and Frog's question is excellent.

I remember reading a journalist's description of once getting to test drive a nice car at a racetrack. I don't remember the cars involved, but for the sake of argument let's say that they were a turbo-charged Toyota Supra and a 4 door Toyota Corolla. The former is a high-performance sports car and the latter is a regular family car. The reporter took the Supra out on the racetrack and was happily zipping around turns, enjoying the rush of hitting the accelerator when he noticed someone coming up from behind.

One of the people with him was a professional race car driver and he hopped in the clunky 4-cylinder Corolla and outdrove the reporter in the turbo-charged Supra.

Now frankly, if you take two regular drivers and put them on a racetrack, it's fair to say that the driver with the Supra is probably going to win. However, in the hands of the skilled professional, even a tool not suited to the job can be excellent. In fact, I would go so far as to say that a talented amateur might regularly use substandard tools and not realize how much better life can be with tools better suited to the task at hand.

Of course, that's a terribly unsubtle transition to my actual topic: DBMS, or Database Management Systems. Note that I do not refer to RDBMS, or Relational Database Management Systems since I don't know if any actually exist. SQL databases are a distant relation, if you'll pardon the pun. Further, it's not even fair to say that this merely proves that SQL databases are well-suited for their role since no relational database has ever been built that I'm aware of, thus making it very difficult to demonstrate how the relational model works. (I understand Alphora has taken a stab at it and I've heard interesting things about them, but given how little people understand relational theory, it's going to be tough for them to make much headway).

The problem with MySQL is simply that it's a Toyota Corolla and other DBMSs out there are far more powerful. However, MySQL is not just a Corolla, it's a broken Corolla that lacks brakes, decent windshield wipers or seat belts. In the hands of someone competent, it can be a very powerful tool but there are very few programmers I've met who know much about databases, database design (normalization) or database management. They frequently know how to write SQL and think "I can design this. It's simple!"

It's not simple. You can have two databases containing apparently identical information but are required to be normalized differently due to different business rules. Normalizing a database can be very hard work, particularly as the database grows. There are even "code smells" which point to bad database design, but precious few people recognize them.

A properly normalized database should make it extremely difficult to insert bad data into. Instead, it's often trivial to insert bad data into databases and the programmer instead needs to should the burden of catching all of those mistakes. Regrettably, every time I work with MySQL, I find that reasonable tools to make corrupt data insertion difficult are lacking. In fact, there the MySQL developers used to have a paper out on the Web arguing that foreign keys do not belong in the database! (You'll no longer find this on their Web site).

So if you're a competent database administrator (DBA), you can probably utilize MySQL better than a casual developer can utilize Oracle or PostgreSQL. If you have an exceptionally strong programming team, you might not get bitten by the many, many bugs you'll find in MySQL. For most of us, however, we lack that exceptionally strong programming team (most shops I've worked with don't even write tests) or good DBAs (most shops I've worked with don't even have DBAs).

So, in short, MySQL makes it much easier to corrupt your data than other databases and, in many ways, actually seems to encourage it. I and others on this site and numerous other sites have well-documented many of the flaws of MySQL. They're finally fixing a number of those flaws, but the MySQL developers have failed to understand databases for so long that it will be quite a while before they win my trust. They can't merely slap foreign keys, transactions and stored procedures on their toy and say they're all grown up now. Regrettably, too many developers have an false sense of confidence with databases and simply don't know what they're talking about when it comes to them (mind you, these are frequently developers who are far better programmers than I am -- database development seems to foster a special kind of ingorance.)

O'Reilly has an interview with noted relational expert C.J. Date. This interview only briefly touches on the topic I'm mentioning here, but he pretty much blasts all DBMSs out there, not just MySQL. If you learn how to normalize databases, I highly recommend Database In Depth, his book which lets you get an understanding of the relational model and a better understanding of why current misimplementations are so flawed. Even though you likely won't be in a position to work with a DBMS which rectifies these flaws, understanding the limitations lets you better control the handling and cornering of your Supra, er, database.

As an aside, I am not a database expert. I've definitely met people who understand them better than I do and can much more easily rewrite queries, point out design problems and keep things running smoothly. Despite this, I am still constantly astonished at how otherwise excellent programmers don't realize that the block of NULLs in a table which get filled in at once is a code smell. They don't understand why using identifying keys can be problematic or why naming a table after a particular vendor might not be such a good idea. Many times developers don't even use simple tools like transactions, stored procedures or views simply because they've not used them before and don't understand them. I honestly believe that common understanding of database development is lagging a couple of decades behind common understanding of software development.


Oracle

sigzero on 2007-04-11T00:08:02

Thanks for elaborating. I can agree totally. This is the first project I have been on to use Oracle. All dates are strings and not DATE (so I have to convert in almost all calls). All fields are NOT NULL and a fixed length even if the data doesn't come close to using the space (I know how to use rtrim and I have to do it in almost every call). There are no foreign keys (it is about 1500 tables). It is very frustrating.

Re:Oracle

Ovid on 2007-04-11T06:47:31

Wow. That sounds like giving a kid a Ferrari for driving lessons and then not bothering to teach them to drive. I actually don't mind having everything NOT NULL, but that's only if folks know why they're doing that and clearly whoever worked on your database didn't.

Can you do something like Rose on top of it? If so, it could at least hide a lot of that cruft for you.

Re:Oracle

sigzero on 2007-04-11T12:03:39

The NOT NULL is not as annoying as the everything is padded with spaces in the fields. So if the field is 10 characters and the data is 3, they tack on 7 spaces to make it fit. That is freaking annoying as hell.

They have crashed the "ferrari". What a waste.

Wow, thanks!

Frog on 2007-04-11T01:19:29

Score one for asking dumb questions. I'll keep your advice in mind and be extra careful since my current project uses MySQL.

avoid COBOL too

perrin on 2007-04-11T22:06:43

Learning how to use a database well is good advice. Never mind the relational model -- I've met many Oracle users and even DBAs who didn't seem to have a solid grasp of the transaction model and isolation levels. These are pretty fundamental things to know about your database.

Your anti-MySQL rant is getting really out-of-date though. MySQL has had transactions and foreign keys in a GA release since March 2003, with the same MVCC locking model that Postgres and Oracle use. They've had a strict mode that prevents the accidental data truncation issues in GA since October 2005. I wouldn't advise anyone to use versions of Perl or PHP older than 4 years, and I wouldn't advise anyone to use a version of MySQL earlier than 5 at this point.

Re:avoid COBOL too

Ovid on 2007-04-12T17:05:43

The strict mode is barely a year old and MySQL, with their perpetuation of their "who needs valid data?" ideas has decided to make this optional. Why data integrity should be optional, I don't know, but this continual misunderstanding on their part (remember when they published that laughable anti-FK essay?) makes me extremely leery of them and their product. Also, you still can't attach triggers to views (something which hurt us at Kineticode and would have allowed MySQL to simulate something much closer to the relational model).

This isn't to say that PostgreSQL is free from complaints, but anyone who thinks MySQL versions less than 5 are decent simply doesn't know databases.

However, I do take your point about recent versions of MySQL. I rather wish we could upgrade at work, but until our test suite starts approaching much better coverage, this will not happen.

upgrade incentive

perrin on 2007-04-12T17:36:16

MySQL 5 reached beta in March 2005, over two years ago, and a production release in October 2005. The 5.1 production release should be happening pretty soon, and 5.2 is already available in alpha.

One thing that might help you make a case for some urgency on upgrading to version 5 at work is that the other versions are now so old they have a "end of product life" notice on the MySQL web site.

Re:avoid COBOL too

leandro on 2007-04-28T12:23:29

MySQL has had transactions and foreign keys in a GA release since March 2003, with the same MVCC locking model that Postgres and Oracle use.

Wrong. They do have transactions and FKs, but no MVCC. InnoDB is a sorry attempt at that, ill documented, and writers in it will block readers, sometimes a consistent dump will lock the whole database, and it can’t generally be trusted.

And even in GA, the sorry MyISAM is still the default.

Your MySQL Version Sir?

vek on 2007-04-12T06:36:37

Just out of curiosity, what version of MySQL are you running?

Data normalization

djberg96 on 2007-08-21T01:34:24

A couple of articles that may interest you:

http://www.infoq.com/news/2007/08/denormalization
http://www.infoq.com/news/2007/08/mnesia
http://blogs.msdn.com/pathelland/archive/2007/07/23/normalization-is-for-sissies .aspx

Any thoughts on denormalization, distributed databases, Erlang, etc?