Database - Slave or Master? 1 of 3 - Database Abstraction

mugwumpjism on 2006-08-19T16:53:14

After the ACID revolution of the 1960s, Relational Database Design was the next big thing during the late 60's and 70's. It marked an evolutionary step forward from the Heirarchical models of early ACID conformant systems; for after all, it included the heirarchical model, as any heirarchy can be expressed as relationsÃÂ¹, yet transcended it by expressing structures that didn't fit heirarchies.

And it has some solid theory behind it as well - the relational model has strong roots in mathematics and logic, and so you can expect that University-goers will be poring over it with a bit more scrutiny and peer review than your average use.perl.org columnist.

Through all this, we have a decent set of experience for looking at data management problems through the goggles of the Relational Model, of which modern Relational Database Management Systems (RDBMS's) provide a reasonable approximationÃÂ². We have built it up logically with key concepts such as constraints, sequences, triggers, joins, views, cursors, etc, and well-known performance hacks such as indices, partitioning or materialized views. And this logical layering is what allows us to build complex RDBMS's and database applications that do not violate the strict requirements of ACID. Well, some of us, some of the time. I won't say it's easy to do it without making mistakes.

We have a set of rules that let you decide whether data in the model is normalized - that is, it is not duplicated or aggregating any other information in the database, or de-normalized. We should be able to look at a table, and decide whether that auto_increment primary ID key is actually a normalized and valid member of the model (such as a customer or invoice number), or whether it is just a surrogate ID thrown on the row so that the programmer doesn't have to guess whether table.id exists or not, that does not actually mean anything in terms of the data model.

We have a graphical language of notation, called crowfoot diagrams (example). And this is a very good common language.

We even have Relational abuses such as stored procedures and NULL valuesÃÂ².

But we want a common language for writing Perl components, not just how for talking to DBAs or writing database schema. We cannot write entire applications in SQL. And nor do we want to.

What defines "Database Abstraction"?

For the heritage for this term, we can look to Dave Rolsky's POOP Comparison document. POOP stands for Perl Object-oriented Persistence, and stands out as one of the worst acronyms for a user group ever.

So, "Database Abstraction" is my own refactoring of the term "RDBMS/OO Mapper" from the above document. Modules such as DBIx::Class and Dave's Alzabo clearly fit into this category.

Allow me to hazard some key characteristics of modules strictly in this category;

they (in principle) do not have particular requirements on table layout, such as surrogate IDs or type indicators
they do not try to represent or provide concepts not described by orthodox relational model literature, such as inheritance

Perhaps I'll think of some others as time progresses; I'll try to add them here if I do.

What's so cool about DBIx::Class

In a nutshell, it does the Database Abstraction part very well, with a clean modular implementation via Class::C3. Which isn't quite as chic as Moose, but close enough that it's probably not worth re-writing DBIx::Class in the near future. It has active maintainers, it has releases, it has users, it has mailing lists and IRC and all those other indicators of projects which are "succeeding".

One thing I particularly like about its API is DBIx::Class::ResultSet. In particular, the way that you don't get tables from your schema, you get result sets that happen to be for all objects. What's more, they don't actually run the query until you use them, which makes for easy piecely building of simplish queries.

Driving the Perl Objects from the Database Schema

One of the most popular DBIx::Class extensions, which I also think is pretty nifty, is DBIx::Class::Schema::Loader. This actually connects to a database, uses DBI's various interfaces for querying the table structure in about as DB agnostic a way as you could imagine a tool of its class doing, and then calls the relevant DBIx::Class hooks to create classes which are a reasonable representation of what it found in the database.

For those people who are adamant that best practices be strictly followed, and normalization guidelines honoured, this works very well - and it sure is a delight when you have an application with a database clean enough for this to work without tweaking the schema. Then again, those developing applications from scratch might prefer writing in DBIx::Class directly.

What's the model of your model?

In all of the above scenarios, but particularly with the Loader, the model (ie, schema) of your database has a meta-model (ie, governing schema form). It is a very close relative of the Data Definition Language, DDL - CREATE TABLE statements and so-on that tell the database what to do. And that is perhaps key to the success of DBIx::Class and perhaps all other modules that work like this - they piggy back on the success of the relational model.

It should be noted that the DBIx::Class meta-model is currently implicit; there is, for instance, a DBIx::Class::Schema module that lets you create objects for a model, but they just go ahead and make the classes immediately rather than a separate step. The closest thing I could find to a pure set of data structures that represent the schema was probably DBIx::Class::Schema::Base, but even that had the "side effect" of compiling the classes into Perl as the schema is constructed.

But that's not necessarily a harsh critique of a real problem. As an exercise, and for a RAD (Rapid Application Development) tool I was writing at the time to procrastinate from building a real application for a VC project, I developed a set of modules for Tangram called T2 that described the Tangram meta-model using the Class::Tangram meta-model. I later found myself wanting to do the same thing to Class::Tangram itself - that is, have Class::Tangram re-entrantly be its own meta-model. Other people have tried this sort of thing, too - Kurt Stephen's UMMF, David Wheeler's Class::Meta, etc. Metamodelling really amounts to the data modeller's equivalent of navel gazing - ie fruitful but only with good practice and a clear mind. I admire Stevan Little's accomplishment with Class::MOP in this regard, which is why I didn't cut my Moose talk.

But I digress. Why don't I summarise the usage scenarios where I think the Database Abstraction approach really wins.

Summary - Where Database Abstraction Wins

There we go, large heading and everything. I have observed Database Abstraction to be effective, both in my own practice but more in others, in two situations:

Well designed models: If the information has been modelled well using classical set theory notions, and those notions are adequate for the task at hand and little denormalization present in the data, then any approach that ends up getting to DBIx::Class classes will work well.
retro-fitting existing models: The DBIx::Class::Schema::Loader wins here. You already have a set of tables, you've defined your foreign keys properly using constraints and what-not, and it's not just a bunch of integer id keyed data dumping grounds, so just go ahead and load it all using a set of clearly-defined conventions.

Right, time to collect a free meal for my delayed flight, then I'll have a crack at part 2.

Footnotes:

Yes, querying heirarchies in SQL sucks and usually relies on vendor-specific extensions which are inflexible and not portable. We will get to this a bit more in part 2 hopefully.
Insert long rant about NULL values and duplicate rows here.