Normal Forms

ziggy on 2006-02-20T18:57:36

A quick cheat-sheet on the various forms of database normalization

First Normal Form (1NF): No multivalued fields. Example: No such column like programming_languages_used containing the value Perl, C. Similarly, a tuple like project; programming_language_1; programming_language_2 is also avoided.

Second Normal Form (2NF): A unique key identifies each tuple value (row) in each relation (table).

Third Normal Form (3NF): Data is not duplicated across relations (tables). Example: store customer information and order information separately; do not repeat customer information across a table of users, orders, wishlists, shopping carts, etc.

Fourth Normal Form (4NF): No complicated multivalue dependencies. Example: if an employee may have one or more certifications, and a may attend one or more training classes, store these relationships separately. A tuple that combines employee_id, certification_id, and training_class_id is bad. Instead, two tuples should be stored, one relating employee_id to certification_id, and one relating employee_id to training_class_id.

Fifth Normal Form (5NF): All joins are in 4NF. (See example for details, implications for duplicate elimination, and associated benefits.)

Sixth Normal Form (6NF): A database in 5NF that extends 5NF characteristics into a temporal dimension. Possible example: a data warehouse that archives activity quarterly, and gathers the current quarter's activity along with the most recent previous 3 quarters of archived activity to produce a set of results over the last 12 months.


Nice!

ChrisDolan on 2006-02-20T21:45:26

Great summary. Thanks! You should consider adding something concise like this to the top of the Wikipedia page.

6NF

Ovid on 2006-02-21T02:46:14

One thing worth noting, from what I've read, is that 6NF appears to be a form of horizontal decomposition which, amongst other things, can be used to remove NULLs from databases (pdf). Personally, I've always tried to follow the idea allegedly proven in one IBM reseach paper (that I can't find a link to right now though I know I've linked to it before) that any database in 3NF is automatically in 5NF if all keys are simple (single columns). I suspect that Date and Darwen wouldn't like that, though, as they do seem to object to forcing single column keys (again, stated from memory, I could be wrong).

Thus, it tends to be enough to remember how to achieve 3NF. The 6NF stuff only applies when needing to handle disjunctive data (such as temporal data). Of course, this post would be much more effective if I could find the 3NF to 5NF link :)

Regrettably, DBMSs tend to be optimized for SQl, not for truly relational data, so putting a large, heavily used database into 6NF is frequently not practical.

Re:6NF

Roy on 2007-06-22T17:24:52

any database in 3NF is automatically in 5NF if all keys are simple (single columns).

http://portal.acm.org/citation.cfm?id=132274 is what you are refering to.

I suspect that Date and Darwen wouldn't like that

I remember that there was a fair amount of strop about this theorem. Some derided it as useless, others thought it was practical.

I cannot answer for either Chris or Hugh, but notice Chris is one of the authors.

Also see http://cas.uah.edu/mokw/public/IPLKeys.pdf which takes the issue a bit further.

Roy