Databases Suck

Ovid on 2005-10-21T01:02:24

Anyone who has worked with databases long enough knows that the relational model does not map well to the real world. Further, the necessity of database normalization is not something inherent to the data; it's mostly (not entirely) an artifact of the relational model. For example, how do you store customer addresses? Two database might store it completely different ways yet both be normalized. The common example is to do something like this pseudo-code:

create table customer {
  id ...
  first_name ...
  address ...
  city_id ...
  state_id ...
  etc ...
)

Now let's say that you're having a sale and you want to send fliers to all of your customers saying "We're having a sale. We hate you and we don't want you to show up." Getting their addresses is easy.

Now business two is also having a sale but they want to send out two sets of fliers. The first set tells their customers that they hate them and don't want them to come and the second set is to all addresses for which they don't have a customer. Now they have problems if they're using the above table. They have to have a separate addresses table and the customer table looks like this:

create table customer {
  id ...
  first_name ...
  address_id ...
  etc ...

The problem here is that business knowledge is required to properly know how to normalize the database. For the second business, the address is not dependent on the customer, unlike the first business. In other words, the database reflects domain knowledge but you, the poor fool having to maintain it, can't tell from looking at a database schema if it's really reflecting business knowledge or if it's a mistake.

This is sort of silly. For the first business the addresses exist in the real world whether or not they have customers there. Further, their database is junk if a customer has a summer home and a winter home. Even then they might be facing performance problems and have gone with a deliberate decision to denormalize parts of the database. Despite what "purists" argue, sometimes this has to be done (my last company had an Oracle performance expert come in and finally admit that he couldn't speed up their billion row table which relied on aggregated custom data types).

Object databases and multivalue databases can get around many of these problems by allowing data to be arranged in a heirarchical fashion instead of a relational one (though they may use a relational database under the hood). By doing this, if properly designed, they trade some performance for better real world modeling. But there's still a problem. Consider a table where we store what a customer likes (more pseudo-code).

create table customers (
  id ...
  customer_name ...
  likes (multivalue, fk likes.id) ...
)

create table likes (
  id ...
  thing ...
)

With that sort of multivalue relationship, we can fetch everything John likes very easily:

SELECT @likes FROM customers WHERE customer_name = 'John'

We can also find out everyone who happens to like something John likes:

SELECT @likes 
FROM   customers
WHERE  customer_name != 'John'
  AND  @likes IN (
    SELECT @likes FROM customers WHERE customer_name = 'John'
)

So far, so good. We've gotten a decent amount of power from violating first normal form. However, there's still a problem. These databases (from what I've seen) usually constrain relations on type, not capabilities. How do we say that John "likes" another customer? There's a very good chance we can't do that. The likes table might have a list of customer ids or it might have a list of product ids, sport ids, etc. Because these technologies seem to restrict things tightly based upon types (and it's tough to introduce an FK constraint on multiple tables on the fly), we still can't model the real world terribly well.

I suppose one could actually get something like this to work by reaching into the system tables or something, but it would be difficult. Further, if we tried to do something like this, I suspect that the slow object or multivalue databases would be even slower. Moving one step closer to reality is one step further from an application that responds.


I wouldn't know...

sigzero on 2005-10-21T02:54:34

I am stuck with an Oracle database that I can't change, has spaces filling up every table regardless of what is actually in the table and everything is a CHAR.

Re:I wouldn't know...

sigzero on 2005-10-21T02:55:55

Oh, I forgot to meantion the good part. I am really good at using "to_date" and "rtrim".

Huh?

autarch on 2005-10-21T03:12:08

The real world is not hierarchical. Some parts of it are, some aren't. A system like MV which only lets you express hierarchical relationships is not going to be a better model of the real world than a SQL system.

Your schema reflects your business rules. If you require that all customers have an address, there should be a constraint to this effect. If customers can have more than one address, I'm sure you know quite well how to model that as well.

Also, SQL is not relational. It's a poor second cousin of relational, and it does suck. Imagine that you had a richer constraint language so you could easily specify constraints on the database as a whole (aka across multiple tables). Given that, you could very easily allow for each customer to have more than address, and require that they always have at least one. Similarly, you could easily require that each address have a customer.

Of course, this also requires the ability to clearly define the boundaries at which constraints are checked. The nice way to do this is to allow for multiple statements at once:

    UPDATE ....,
    DELETE ....,
    INSERT ....
    ;

If constraints are checked at the semi-colon, you can satisfy complex constraints across multiple tables and easily write DML code that will not violate them.

Conceptually, this is pretty trivial, but I still haven't seen an RDBMS that really supports this stuff well. I've mostly used Postgres and MySQL, which both do suck. But my experience with Sybase, SQL Server, and Oracle has suggested that they all suck too, just more powerfully.

Oh and another thing...

sigzero on 2005-10-21T15:49:18

I forgot to mention this though I don't know how I forgot.

All tables (all 1500 of them); every field is NOT NULL. Every field.

First s/databases/software/ ...

Adrian on 2005-10-30T09:43:28

Anyone who has worked with software long enough knows that the relational model does not map well to the real world.

Then pick any of the following:

s/relational/procedural/
s/relational/object oriented/
s/relational/functional/
s/relation/logical/
s/relation/aspect oriented/
s/relational/hierarchical/

Continuing this list is left as an exercise for the reader :-)

Models are not the real world. They're an abstraction chosen to make a particular problem domain easier to manipulate. Sometimes people pick the wrong model. Sometimes the problem domain changes.

There's nothing that's particularly relational about the issue about whether addresses are separate "things" from customers. I've seen exactly the same issue with OO code.

In other words, the database reflects domain knowledge but you, the poor fool having to maintain it, can't tell from looking at a database schema if it's really reflecting business knowledge or if it's a mistake.

And I can't tell just by looking at a object class hierarchy whether a particularly odd looking arrangement of classes is a reflection of business knowledge or if it's a mistake.

Being normalised doesn't say anything about the fitness of a relational model for a particular problem domain any more than the Law of Demeter says anything about the fitness of an OO model for a particular problem domain.

Relational models that are normalised have nice properties that can be useful. OO models that follow the Law of Demeter have nice properties that can be useful. They're talking about how good the model is. I don't think it's really fair to ask them to say things about how well a model fits a problem domain too!

Which of course would be a wonderful thing to know.... wish I knew how to figure it out... :-)

Object databases and multivalue databases can get around many of these problems by allowing data to be arranged in a heirarchical fashion instead of a relational one

I'm curious why you think heirarchies and relation models are contradictory (nested set tree model and all that...) ?