Database - Slave or Master? 3 of 3 - Integration

mugwumpjism on 2006-08-23T12:27:02

This story begins with an effort to store Moose classes in a Tangram store. Specifically, converting from Moose::Meta::Class objects to a Tangram::Schema structure.

The structures are already quite similar. In the Tangram schema, you have a per-class map of (type, name, (details...)). In Moose::Meta::Class, you have a map of (attribute, (details...)), where the details includes a type constraint. Based on the type constraint, you can guess a reasonable type. Well, not quite. The next thing you really need is Higher Order Types on your type constraints (called parametric roles in the Perl 6 Cannon). In a nutshell, that's not just saying there's an Array somewhere, but saying there's an Array of something. Then you can make sure that you put an actual foreign key or link table in that point in the schema, rather than the oid+type pair that you get with Tangram when you use a ref column (and, in recent versions, without specifying a class). Getting parametric roles working in Moose is still an open question, but certainly one I hope to find time for.

So, during this deep contemplation, I thought, well, what would Tangram be adding? I mean, other than the obvious elitism and other associated baggage? Why not just tie the schema to the Moose meta-model, and start a new persistence system from scratch? Or use DBIx::Class for all the bits I couldn't be bothered re-writing?

In principle, there are reasons why you might want the storage schema and the object metamodel to differ. You might not want to map all object properties to database columns, for instance. Or you might want to use your own special mapping for them - not just the default.

Then I thought, how often did I do that? I added a transient type in Class::Tangram for columns that were not mapped, but only rarely used it, and never for data that I couldn't derive from the formal columns or some other truly transient source. I only used the idbif mapping type for classes when I didn't have the time to describe their entire model. So, perhaps a storage system that just ties these two things together would be enough of a good start that the rest wouldn't matter.

The Evil Plan to NOT refactor Tangram using DBIx::Class

Ok, so the plan is basically this. Take the Tangram API, and make the core bits that I remember using into thin wrappers around DBIx::Class and friends. Then, all of the stuff under the hood that was a headache working with, I'll conveniently forget to port. That way, it won't be a source compatible refactoring, just enough to let people who liked the Tangram API do similar sorts of things with DBIx::Class.

The first thing I remember using is a schema object for the connection, if only because of acme's reaction when I say "schema". In a talk I'd use a UML diagram at this point, but given <img> tags are banned, instead let's use Moose code.

 package DBIx::Moose::Schema;
 use Moose;
 has '$.classes' => (is => 'ro',
                     isa => 'Set of Moose::Meta::Class',
                     );

Alright. So, we have a schema which is composed of Moose Classes. The next thing we need is a Storage object that has the bits we want;

 package DBIx::Moose::Storage;
 use Moose;
 use Set::Object qw(weak_set);
 has '$:db' => (is => 'ro', isa => "DBIx::Class::Schema");
 has '$:objects' => (is => 'rw', isa => "Set::Object",
                     default => sub { weak_set() } );
 has '$.schema' => (is => 'ro',
                    isa => "DBIx::Moose::Schema");

That weak_set is a little bit of magic I cooked up for nothingmuch recently. All we're doing is keeping references to the objects we've already loaded from the database, primarily for transactional consistency. Actually, Tangram uses there a hash from an oid to a weak reference to the member with that oid, but I think that oids suck. In Perl memory, the refaddr can be the oid.

And we'd need an overloaded query interface;

 package DBIx::Moose::Remote;
 use Moose;
 has '$._storage' => (is => 'ro', weak => 1,
                     isa => "DBIx::Moose::Storage");
 has '$._class' => (is => 'ro',
                   isa => "Moose::Meta::Class");
 has '$._resultset' => (is => 'ro',
                       isa => "DBIx::Class::ResultSet",
                       default => \&_rs_default,
                       );
 sub _rs_default {
     my $self = shift;
     $self->_storage->resultset($self->_class);
 }

So, hopefully, the DBIx::Class::ResultSet API will be rich enough to be able to deal with all the things I did with Tangram, or at least it will given enough TH^HLC.

There will be a bit of double-handling of objects involved. Basically, the objects that we get back from DBIx::Class will be freed very soon after loading, their values passed to a schema-specified constructor (probably just Class->new), and then their slots that contain collections that are not already loaded set up to lazy load the referant collections on access. This happens already in Tangram; the intermediate rows are the arrayrefs returned by DBI::fetchrow_arrayref(). So there will be lots of classes, perhaps under DBIx::Moose::DB::, that mirror the objects in the schema. Perhaps we don't need that, but it should be a good enough starting point, and if it can be eliminated entirely later on, then all the better. (Update: Matt has kindly pointed me to the part of the API that deals with this; this shouldn't be a problem at all)

Mapping the Index from the Class

One of the nice things about a database index is that it's basically a performance 'hack' only (because databases are too dumb to know what to index themselves), and do not actually affect the operation of the database. So, for the most part, we can ignore mapping indices and claim we are doing the 'correct' thing ;).

That is, unless the index happens to be a unique index or a primary key. What those add is a uniqueness constraint, which does affect the way that the object behaves. So, what of that?

Interestingly, Perl 6 has the concept of a special .id property. If two object references have the same .id property, then they are considered to be the same object. This has some interesting implications.

After all, isn't this;

 class Book;
 has Str $.isbn where { .chars < 255 };
 method id {
     $.isbn;
 }

The same thing as this?

CREATE TABLE Book (
    isbn VARCHAR(255);
    UNIQUE PRIMARY KEY (isbn);
);

So, we can perhaps map this in Perl 6 code, at least map one uniqueness constraint per type. Generalising this to multiple uniqueness constraints is probably something left best to our Great Benevolant Navel-Gazers. In the short term, we'll need to come up with some other kind of way of specifying this per-class; probably a Moose::Util::UniquenessConstraint or somesuch.

Mapping Inheritance

Alright, so we still have inheritance to deal with. But wait! We've got a bigger, brighter picture with Moose. We've now got roles.

Fortunately, this is OK. The Tangram type column was only ever used (conceptually, anyway) to derive a bitmap of associated (ie, sharing a primary key) tables that we expect to find rows in for a particular tuple. So, if we map the role's properties to columns, then we only have to "duplicate" columns for particular roles, if those roles are composed into classes that don't share a common primary key.

The other features

Well, there may be other important features that I'll remember when the time comes, but for now I think there's enough ideas here to form a core roadmap, or at least provide a starting point for discussion.

SSS SSSSSS SSSSSSS SSSSSSS!

acme on 2006-08-23T14:57:48

*JUMP* You said the S word! Multiple times!