Ow! My unnormalized head!

Ovid on 2006-04-21T09:31:00

One Dawn Wulthius suggests that we stop normalizing our data.

Interestingly, I think her example is misguided, but not for the reasons one might think. Her example of "unnormalized" data is an example of a way one can define a set of data in a single attribute in an MV database. For example:

id | name  | email
------------------------------
1  | Bob   | bob@hotmail.com
------------------------------
2  | Sally | sally@hotmail.com
           | sally@yahoo.com

In that example, "email" is a multi-valued field which can contain a set (or possibly a bag) of email addresses (I do believe duplicates might be allowed, but I can't recall).

That seems to violate to first normal form (1NF), but this appears to cause problems in traditional databases for technical reasons, not mathematical ones. In a multi-value (MV) database, if you have a list of reports sent to specific email addresses, you can tie them to the particular email address you want and if someone deletes the email address, you can either throw an exception (because the report still exists) or have the equivalent of a cascading delete -- the report gets deleted.

This technique has several powerful benefits. Email addresses don't require a separate table, the query language can be simpler and many-to-many relations no longer require lookup tables. However, popular database systems don't support this (unless you do a lot of complicated work with rules or triggers).

I write that it seems to violate 1NF, but only because folks usually misunderstand 1NF. 1NF states that the domain of an attribute must contain single values. However, some (such as C.J. Date) argue that attributes don't need to contain indivisible values. A value can be a collection or an object. Traditional values allowed in attributes are not necessarily indivisible. A string can be broken down into characters. A date can be broken down into date parts. Integers can be broken down into prime factors. However, most would agree that dividing these types would be silly. So why do we rely on such arbitrary types? What is important is not that an attribute be indivisible; it's that the data contained in an attribute have a clearly defined type and operators which can act on that type (and in such a system, FK contraints which can work with user-defined types). Unfortunately, popular databases don't tend to support user-defined compound types very well and MV databases are merely a crippled implementation of the relational model.


Array types

djberg96 on 2006-04-21T14:32:27

Oh, good point. Is something like PostgreSQL's array type the proper solution here then?

create table sometable (
   id serial,
   name varchar(50),
   email text[]
);

This would store a list of email addresses as a single object. Or have I missed the point again?

In other news, I'm surprised PostgreSQL doesn't have an "email_address" type. It seems to have everything else.

Re:Array types

Theory on 2006-04-21T16:36:48

In other news, I'm surprised PostgreSQL doesn't have an "email_address" type. It seems to have everything else.

You can create one easily enough with a CREATE DOMAIN statement. Here's a naive implementation:

CREATE DOMAIN email_address AS TEXT
CONSTRAINT ck_email_address CHECK (
   VALUE ~ '^[^@]+@[^@]+$'
);

Modify the regular expression as needed to increase its accuracy. Then you can use it like any other PostgreSQL data type:

CREATE TABLE person (
  id SERIAL PRIMARY KEY,
  name text,
  email email_address
);

Re:Array types

Ovid on 2006-04-21T16:54:49

As Theory points out, you can create such a type. The problem is that if anything else needs to reference data within that type, Postgres does not allow any convenient way of doing so. This makes it pretty difficult to use Postgres for MV purposes.

Re:Array types

iburrell on 2006-04-21T17:31:52

PostgreSQL actually supports composite types.
CREATE TYPE inventory_item AS (
    name            text,
    supplier_id     integer,
    price           numeric
);

CREATE TABLE on_hand (
    item      inventory_item,
    count     integer
);

SELECT (item).supplier_id FROM on_hand;
It sounds like PostgreSQL even automatically makes a composite type for each table. Composite types aren't used much since they are nonstandard. They are most useful for communicating rows to user-defined functions. PostgreSQL also has user-defined types where new scalar types are defined by implementing a bunch of functions.

Re:Array types

runrig on 2006-04-21T19:49:27

How do you then update, insert, delete, or search for a single email address? If it makes things more complicated to deal with the data, I think I'd rather normalize. Hmm, looking at the postgres docs, searching for a value in an array:
SELECT * FROM sal_emp WHERE pay_by_quarter[1] = 10000 OR
                            pay_by_quarter[2] = 10000 OR
                            pay_by_quarter[3] = 10000 OR
                            pay_by_quarter[4] = 10000;
But that can be abreviated to (especially when you don't know how many elements there are):
SELECT * FROM sal_emp WHERE 10000 = ANY (pay_by_quarter);
But they offer this advice:
Tip:  Arrays are not sets; searching for specific array elements may be a sign of database misdesign. Consider using a separate table with a row for each item that would be an array element. This will be easier to search, and is likely to scale up better to large numbers of elements.

Re:Array types

Ovid on 2006-04-21T20:09:11

The problem with CRUD on a single email address goes away if you have a language which supports that. SQL does not. While I confess that I don't remember my UniQuery (the query language for the MV database I used), imagine something like this:

DELETE email
FROM   customer
WHERE  name  = ?
  AND  email = ?

In other words, DELETE would operate on more than just tuples.

How about searching for a particular email? Well, you show some sample syntax above.

There are plenty of other pseudo-examples I could toss out to show how things work, but basically, SQL (as far as I know) deliberately doesn't address these issues because SQL isn't designed for MV databases.