When databases attack

Ovid on 2003-06-10T18:06:41

One of the worst databases I have ever seen was a database for a Web site that needed to support the following:

  • FAQ questions and answers
  • FAQ Keywords
  • Cooking lessons
  • Different types of food
  • Recipes

Just considering recipes, what sort of tables might we want for recipes? Possible tables include units of measure, ingredients, recipe categories, recipe author and perhaps difficulty level. Given everything that I've listed, how many tables do you think this database had?

Four.

Nope. That's not a typo. This Access database that was driving a Web site had four tables. (The "DBA" that is no longer working with us told the project manager that the database design was fine.) When we bid on the site redesign, they couldn't understand why we wanted to redesign a database that worked for them. Never mind that no recipes were allowed to have more than eight ingredients or that "difficulty level" was an arbitrary string that could be typed in.

This was the worst database I've seen, but frankly, with the exception of large-scale, successful projects, this is the quality of most databases that I see. I hardly claim to be a database guru, but I could run circles around these monkeys -- in my sleep.

So why are so many databases absolutely worthless? I was reading a book on database design and the author -- a college professor -- noted that he was sometimes approached by programmers wanting to teach database design because they knew SQL. While knowledge of SQL and knowledge of database design are not exactly orthogonal to one another, knowing SQL does not mean you know how to design a database, but too many programmers don't understand this.

Personally, I think if you don't have the basics of database normalization down, you have a large gap in your knowledge. Learning normalization will make you a better programmer (and vice versa, I suspect).


normalization

gav on 2003-06-10T18:47:16

It's worth pointing out that sometimes you can take normalization too far and make something that looks good on paper, but is a pain to worth with.

For example, I ported an e-commerce site from JSP/MS-SQL to Perl/TT/MySQL and the schema was really nasty. Instead of creating a product table that looked something like:

product(id, price, name, title, short_description, long_description)

They split things into multiple tables in case they had to add extra fields in the future:

product(id, price, name)

text(product_id, key, data)

I ended up wrapping this mess, and others (they had pricing tables with similar stucture), so that it looked like one table.

I've also de-normalized tables in the past. One example was for a client that wanted to group products together in sets. Something like this:

product(id,...)

group(id, name)

product_group(group_id, product_id)

This was hard for the users to understand and things like having to have "default" groups was awkward (of course, a decent interface would have solved these problems but time and money was tight). A simple change made them happy:

product(id, name, ..., group_name)

The users were perfectly happy being told to type in a group name and keep it unique to make grouping work.

It's a bit of a long and conveluted point, but I'm trying to say that you can't learn to design databases from reading a book. You need to get your hands dirty and find out what works from both a user and programmer's viewpoint. This is why good DBAs are rare.

Re:normalization

runrig on 2003-06-10T21:09:26

product(id, name, ..., group_name)
That's fine, you could even have a table of valid group names as primary keys, and make the group in the product table a foreign key to the group table's primary key.

As for taking normalization too far, I once experienced a table with:

year(year_id, year_num)
The year column was a just a serial id, and the year number was the valid years you could search for. So the product table was something like:
product(product_id, ..., year_id)
Searching for products from 1965 to 1967 in that schema was a horror :-) I talked them into just creating a year table with just the one field, and making the field in the product table and foreign key to it. It took a little time for them to get it (wait, we don't have to even join to the year table...?).

Normalization is just one part

VSarkiss on 2003-06-10T22:50:33

What people need to learn is called "data modeling" or "data architecture". I've worked as a DA many times, and not many people understood why I insisted on doing things particular ways. (Analogous to people questioning why you program things a certain way in Perl: why write tests? why use strict? they just get in the way.)

Once you learn data modeling, it helps to improve your programming in general, because it also applies to data structures in your code. Have you ever looked at a multi-level hash in Perl and said, "No, change the order of those two levels, it'll be easier to look up"? You were doing data model tuning for a hierarchical database.

Unfortunately, data modeling isn't taught formally (most places). I picked it up over the years. I'd recommend starting by learning a methodology like IDEF1X, which, though old, has many good ideas. Then you too can say things like "Partial key migration leads to unnecessarily complicated update queries later." ;-)

Database Design

Odud on 2003-06-11T10:21:57

I wonder if the problem is made worse because of the number of people who have moved from 'toy' products e.g. Access into real design without formal training. I was recently looking at a product where the UI is ok but the underlying design is a mess - there is only one table with the different types of row indicated by various combinations of column values. You can almost smell the underlying spreadsheet that it was historically based on. It's produced by a one-man company and so I guess that explains a lot. I think you're right about the "I can write SQL so I can design a database" view - it's similar to the "I can write some simple macros/VB - therefore I am a great programmer"