One of the coolest things about "Object Persistence", is that it has the word "Object" in it, which of course means better than anything that was around before "Object Oriented Programming" was decided to be flavour of the decade. Even better than that, it even has the word "Persistence" in it, which sounds much more sophisticated and modern than "Database" or "Relational".
Then are shiny tools in this space, like Hibernate for Java. Using Hibernate, you can make an Enterprisey Macchiato Java Servlet that blends your database to a set of Java objects, and then provides Soaped-up Beany Hyper-galactic XML Web services for other parts of your Enterprisey code base to access. It's fantastic - you end up with a set of tables (all with surrogate IDs, of course) that you are guaranteed to not be able to write to safely from anything except the Java Server. This puts the Java developer in control. Which is the way (s)he likes it. Maybe hibernate doesn't have to work like this, but (s)he prefers it because it means that all the changes to the objects have to go through the same central set of functions. Otherwise, the development investment is wasted. And we can't have that, not at the price it cost.
Anyway, Tangram is not quite so antisocial as that. It at least
behaves transactionally, given appropriate use of $storage->unload_all
(also ->recycle) and distribution of clue. But it is
currently anti-social in other ways, such as the surrogate ID
requirement.
Wait a minute - the database has .oids, too
Postgres, and Oracle, both have a concept of .rowid; all tables except for 'index organised tables' have them by nature. I have observed that in the vast majority of code that uses the Tangram API, I never need to use or refer to this .id; in fact, when storing an object in multiple stores, its .id will vary across those stores. In light of this, while I consider surrogate IDs a design flaw - it's not a tragic one, it's consistent with what the database does anyway, and it has allowed for interesting patterns to be built in the mean-time while better ideas come forth. For a more detailed analysis of what I think is wrong with Tangram, see the bugs POD file in the Tangram distribution, especially the section on surrogate type columns (actually I've just tidied those up, if you're reading this before I make a release then read the fresh one here).
What defines "Object Persistence"?
Again, hazarding a set of common features of object persistence tools that could plausibly form part of a definition;
What's so cool about Tangram
The key design feature of Tangram is what is frequently referred to as being orthogonal - it is normally non-intrusive on the objects being stored. A given object may even exist in multiple stores simultaneously (but be represented by the same Perl object). The result? Classes do not need to be aware of their storage; any more than a tuple needs to know it's being stored in a table space.
This is implemented with Lazy Loading. The in-memory data structure is considered equivalent to the database form; via types such as Tangram::Type::Set::FromOne, it is possible to follow joins between tables by just walking Perl objects with visitor iterators like Data::Dumper.
Tangram Querying
For the cases where you have specific questions for your data model, and you are not just following adjacent relations between objects, lazy loading is not enough. We still need some form of query syntax.
For this, Tangram uses Tangram::Expr objects that represent database objects - and they use overload so that you can write your query expressions using standard perl operators (as far as overload allows). Depending on your inclination, you either "run screaming" from this syntax or love it.
In my experience, Tangram's query syntax makes some previously hard queries easy, and some "impossibly difficult" queries easy. You can build intricate joins with a consistent notation. For example, process a form, make a list of Tangram::Expr fragments, and then combine them into a filter that can be used for multiple queries.
# get the table aliases my ($r_artist, $r_cd, $r_track) = $storage->remote(qw(Artist CD Track)); # build a set of filter expressions - some of these # represent joins. my @filters = ( ( $r_artist->{name} eq "The Black Seeds" ), ( $r_cd->{artist} == $r_artist ), ( $r_cd->{tracks}->includes($r_track) ), ( $r_track->{name} eq "Heavy Mono E" ) ); # AND them all together my $filter = reduce { $a & $b } @filters; # then use them for queries my (@cds) = $storage->select( $r_cd, $filter ); my (@tracks) = $storage->select( $r_track, $filter );
The query there is already getting reasonably impressive; the first
->select()
maps to:
SELECT t1.id, t1.type, t1.artist_id, t1.name FROM CD t1, Artist t2, Track t3 WHERE t1.artist_id = t2.id AND t2.name = "The Black Seeds" AND t3.cd = t1.id AND t3.name = "Heavy Mono E"
This is a simple example, and I have found that there are very few real queries on well designed schema that do not map to this syntax well. That being said, sub-selects require an undocumented syntax, and while I have some sympathy to the notion that you should be able to write sub-selects as joins most of the time, it's certainly an example that the API hasn't been extended in all directions yet.
Tangram Maps Inheritance
There are those that would say inheritance is about a relational a concept as an .mdb file, but I think that there is adequate justification for its use in data modelling.
A good question to ask when validating a relational schema to be normal form, is "what does this relation mean?" or "what fact is being represented by this tuple?". We can ask this question for all tables - and the basic answer is "there exists an object with these values"ù. The fact is that the object exists. Better answers can be made for individual tables; consider that answer a template - ask a meta-question, get a meta-answer.
This is where the argument for inheritance stems. The relations still describe existence of an object, but certain types of objects will have extra items in their tuple - relations to the extra properties bestowed upon them by their sub-classes.
In the CD store schema, for instance, 'Artist', 'Person' (perhaps better called 'Musician') and 'Band' are related like this. The justification is, that an artist can be either a musician or a band, but if we are relating to something in its capacity as an artist (ie, from the CD table, to say who released it), there also exists by association a relationship between the CD and all of the properties of the artist in its capacity of a musician or a band.
Tangram short-cuts the query overhead of this situation using a 'type' column. The type column is an index into the schema, and is used to derive a bitmap of which extra tables associated with a base class are expected to have rows for this primary key. This is a de-normalization of data, so technically a hack - as noted on Tangram::Sucks, it should be possible to detect the type using the presence or absence of tuples in the corresponding tables. Or, somewhat equivalently, NULLs when using "Horizontal" mapping - see Tangram::Relational::Mappings for a description of these terms). I'm told that David Wheeler's Object::Relation can work like this.
But what about the Schema?
Having a schema structure that is free from side-effects can be quite useful. Tangram has this down well; its input is a plain set of perl hashes and arrays, no side effects. If you want to use the pure objects to create code, you can still pass them to Class::Tangram::Generator. If you want to connect to storage, pass to Tangram::Storage. T2 was my attempt at making a set of objects that can both describe this Tangram model relationally, and itself be stored in a Tangram database. This is useful for building rapid application development / Computer-Aided Software Engineering (RAD / CASE) tools. Consider Umbrello; it could not compile classes as the objects were manipulated, otherwise you might override internal behaviour and break the running program!
You don't have to write comprehensive schemata any more
Consider the package manager, Fink. Whilst using
Storable for persistence can make applications like Fink
faster by reducing parse time to load their entire state at start-up,
it is still not as fast as a Berkeley, ISAM or SQLite-style database
which is loaded on demand for small accesses.
The general approach is not making the whole schema relational in
one go, but instead cherry-picking out the columns that you think are
useful enough to be indexed, and throwing the rest into a single
column that contains a Storable, Data::Dumper or
even YAML data field which is used to construct the rest of
the object. Tangram::Type::Dump::Any
is built for this. I wrote a Tangram schema for Fink that does this,
which is lurking here.
You end up with a data source which can be queried on all mapped columns, and almost all close that was written for the old, non-Tangram system works too - because previously, the only option was to follow Perl references, but we've made sure they all get lazy loaded.
Where Object Persistence Wins
Yes, I know this is another absurdly long post in a multipart series. That's actually mostly in this case because I have more to say about it, rather than being a particular endorsement of the approach. But more on what I will endorse in the next part.
Footnotes: