Eliminating lookup tables

Ovid on 2005-10-20T03:12:34

One common problem with those new to databases is how they often stick something like field1, field2, field3, etc, in a table definition. Typically this if foolish. It violates first normal form by repeating information (if it doesn't, the fields are probably named poorly). For example, I once had to work on a database which had recipes for pears. The recipes table had eight fields for ingredients. If you had nine ingredients, you were out of luck.

At first blush, someone might create an ingredients table and list all of the ingredients and add a recipe id for each ingredient. Of course, this still does not work. You can probably reuse those ingredients in multiple recipes (such as the pears, duh!). So you create a lookup table. This table two two fields: recipe ID and ingredient ID (yes, we're keeping this deliberately simple).

Want the ingredients for a recipe?

SELECT i.name 
FROM   ingredients i, recipe_ingredient ri, recipes r
WHERE  r.id = ri.recipe_id
  AND  r.id = i.ingredient_id
  AND  r.id = ?

The problem is, it's easy to get that wrong. Do you see where I goofed? Wouldn't it be nice if we could do away with that lookup table?

You can. It's called a "multivalued" database. UniData, Pick, Universe and others violate first normal form and support "multivalued" columns. Instead of lookup ids, you create references to the fields in the corresponding table. The syntax is different (I have the books for UniQuery somewhere but I can't find them), but they essentially allow you to do this:

SELECT ingredients FROM recipes WHERE recipe.id = ?

That's much easier to get right. If you delete an ingredient from the ingredients table, it automatically goes away in a corresponding recipe unless you have a constraint to protect it.

Unfortunately, these require specialized programming and query languages and they're generally proprietary (anyone know of an open-source one?). However, there's something else interesting they allow: another step closer to logic programming.

Consider the following Prolog facts:

likes( john, [cheese, money] ).
likes( mary, [cheese, running] ).

That's really close to how a multivalue database would store records in a table named likes/2. It becomes trivial to join on these values and you can get much richer relational data. Unfortunately, I don't recall that multivalue databases support recursion. Also, the multivalued columns (iirc) must point to a single table. This makes optimization much easier but also reduces the utility. Rats.

You can also create custom datatypes by integrating your database language directly into the table definitions, though my experience has been that this can slow things to a crawl. I really wish I had another multivalue database to play with. As it turns out, there's an open-source mvRDBMS named MaVerick. Regrettably, the documentation is sparse and it looks like I'd have to dig out my UniData books to even play with it. Now where could they be?


Other databases

drhyde on 2005-10-20T09:36:16

I believe that PostgreSQL and Oracle both support this kind of thing. But naturally, they support it in incompatible ways.

Re:Other databases

Ovid on 2005-10-20T19:58:44

Can you point to me where in the Postgres docs this is described? I've found references to it on the Web but I'm having trouble finding it in the documentation.

Re:Other databases

drhyde on 2005-10-20T20:28:47

Likewise, I've seen it described but not looked for it in the docs. Nor have I used it, and I don't have a Postgres install handy to fiddle with.

Re:Other databases

lachoy on 2005-10-21T01:02:53

Is this the section you're looking for: Arrays?

Re:Other databases

Ovid on 2005-10-21T03:55:53

I suspect that's what was meant. Thanks! Unfortunately, it shows that Postgres really can't be used an MV database unless you push a lot of stuff into the application layer. Thus, one could think of it as the MySQL of MV databases (in other words, a fancy toy).

Use a view

Thos Davis on 2005-10-20T14:31:55

While it still requires that you type the query correctly once, views are portable across relational databases (finally even MySQL as of 5.0)

CREATE VIEW recipe_ingredient_list AS
  SELECT r.id, i.name
  FROM   ingredients i, recipe_ingredient ri, recipes r
  WHERE  r.id = ri.recipe_id
    AND  i.id = ri.ingredient_id

so that you could use this query:

SELECT name
FROM   recipe_ingredient_list
WHERE  id = ?

And if you use PostgreSQL as the backend (some other RDBMSes have similar support, but MySQL does not), you can even make the view an updatable view by creating rules that will add appropriate entries to the base table whenever you send an INSERT, UPDATE, or DELETE instruction. And PostgreSQL will let you write the rules in Perl.

Re:Use a view

Ovid on 2005-10-20T15:20:31

Aside from the "write rules in Perl" part, we actually use "updateable" views with Postgres (and SQLite) all the time where I work. It's very handy.

See also

jdavidb on 2005-10-20T19:46:06

Randal's post about ANSI SQL92 JOIN syntax.