Database Abstractions

ziggy on 2004-08-06T15:14:16

Tony, Schwern and the whole Class::DBI crew are really on to something big. I'm seeing object-oriented database abstractions popping up everywhere these days. The most recent one I've seen is for PHP Data Objects (same basic idea, even if it's not as engineered as well as CDBI). I'm currently writing one at work in XOTcl, and I wrote a really trivial database abstraction in Perl for a client that used a regex, a stub, a hash of method_name => sql pairs, and an eval -- about as lightweight as you can possibly get. And I've also seen links for a Common Lisp database abstraction pop up periodically.

It's an interesting form of convergence. I've been seeing object/relational database abstractions for a very long time. They always seemed too heavyweight and overly complex, especially the ones using C++ as the target language. But now there's something different going on. It seems like there's a growing acceptance that this kind of abstraction is a good thing, that these abstractions still work when they're lightweight, and the crucial part is creating a declarative description of the database layout to get most of the benefit.

Now fun begins: once everyone has their pet database abstraction in their favorite language, what core features will emerge across implementations? When we start to get n+1st generation abstractions, what are the best features that will remain from Class::DBI and its bretheren, and what features will be dropped on the floor?

Database Abstractions

Ovid on 2004-08-06T15:38:57

The benefits of database abstractions such as Class::DBI are obvious, but the downsides of these technologies can hamper scalability. On one hand, with the CDBI approach, we have a clear mapping of classes to tables, but the databases for which these have been designed are relational when classes tend to be heirarchical. This not only forces developers to accept awkward design constraints (subclassing a table can be an act of supreme masochism), but with the wrong design, it can also limit scalability.

Consider a hypothetical case when the person writing the code needs to get a list of all customers who shop in a particular region. First the programmer gets a list of all stores in the region, then perhaps a list of salesmen in the region and ultimately a list of customers for each salesperson (naturally, an adhoc example.) This simple idea can involve three queries to the database (or more, if things are really fouled up) even though conceptually it's just one query. Abstractions like Tangram can reduce this to a single query, but at the expense of creating a scheme that fewer people understand (and I understand that modifying things after the fact can be a pain.)

But wait! Programmers point out to me that it's easy to create a custom SQL snippet in the Region class and fetch the salespeople, but then we're starting to slip away from the abstraction goal we were shooting for. As systems get larger, we start needing more and more ad hoc queries. Writing custom SQL for each one becomes problematic. People tell me I'm being silly because it's CDBI and friends are just lightweight tools and we programmers have to fill in the blanks, but I think that's the wrong way to look at it. Persistent data is crippled and CDBI and friends are such wonderful crutches that I think many people forget about walking. The abstraction starts out great and it's certainly better than what we've been doing, but we still need to figure out a better way of approaching the situation. I've never played with OO features in databases, but perhaps those can help (and multi-value databases might help, but they're not well known and it's just a hunch on my part, anyway.) I actually have some code ideas that I've wanted to explore and that might help but I never seem to find the tuits :(

Re:Database Abstractions

rafael on 2004-08-06T15:41:56
I'm sure object-oriented databases are (at least in some design cases) worth looking at. Too bad there is no (to my knowledge) good OSS implementation of an OODMBS.

Re:Database Abstractions

Ovid on 2004-08-06T16:32:28

I thought that Postgres can double as an object database, complete with table leve inheritence. I've never played with those features, though.

Re:Database Abstractions

acme on 2004-08-06T20:30:03
Nah, it's relational but with some features. The inheritance doesn't really help that much.

Re:Database Abstractions

Ovid on 2004-08-06T15:42:00

Dang. I seemed a bit crabby in that, didn't I? I think I made the situation sound worse than it really is.

Re:Database Abstractions

ziggy on 2004-08-06T16:22:46
I don't think that was overly crabby. It just highlights that there's a solution to this problem that is simple, obvious and wrong.
I haven't used Class::DBI in very many projects, but it does make the process of writing an abstraction for a simple database downright painless. I've also written a couple of thin database frameworks, so I've got an idea of what the problem domain is about.
The situation you described feels like a heavyweight OODB mapping. I don't think that's an accurate description of Class::DBI or the new wave of abstractions that are coming out. Sure, an abstraction that requires multiple queries to do the region/store/salesperson/customer walk though the database is a bad idea. Using raw SQL, this would be a single query with joins. Conceptually, it is a database method that returns customers with a single free variable -- region.
Perhaps it's difficult to write an abstraction that can take a database model and generate that method automagically. Perhaps it's difficult and inefficient to write code that walks the tree from regions down to customers. I agree that's the wrong way to do things. The right way (IMNSHO), is to add a method on the customer model that invokes that single SQL query with a single parameter -- region. Sure, you're introducing SQL code into a (Perl) program, but that's not such a bad thing. The point isn't to eliminate raw SQL statements, but to abstract them away from higher level constructs, and allow them to vary as the database model varies, without unnecessary impact to the rest of the system.
Sure, you lose the ability to treat the database as a transparent persistant object store, but I'm starting to believe that was always a bad idea. Which might be why there's no one good solution to this problem.

Re:Database Abstractions

Ovid on 2004-08-06T17:29:21

Well, one thing that might help is to implement a "one table" interface. Imagine that your boss wants a report of the average sales per salesperson. If everything was in one big table, you could write:

SELECT salesman_name, avg(revenue) FROM the_table GROUP BY salesman_name ORDER BY salesman_name

Of course, such a database would be impossible to manage, but if you treat it like that, you can drop the FROM clause:

SELECT salesman_name, avg(revenue) GROUP BY salesman_name ORDER BY salesman_name

Which can be translated into Perl as:

my $report = Report->new( [qw/salesman_name avg(revenue)/], # desired fields [qw/salesman_name/], # group by [qw/salesman_name/] # order by );

And, in reality, that's very close to what we do at work (we also pass in a request object that checks permissions and alters the SQL appropriately.) Something like this could potentially be made to work with CDBI because each class, when properly designed, lists how it links to other classes and the joins would be built off of that. If we grab the data from the classes, we could build a system like that. This is not pie-in-the-sky because we've done it where I work, but the code is proprietary. I've been given permission to reimplement it, though, and I think CDBI would be perfect for this. I think it could reduce the amount of arbitrary SQL that gets written. I just need to get off my tail and write this :)

Re:Database Abstractions

perrin on 2004-08-06T20:18:07
AFAIK, Tangram can't help you with this, nor can any other abstraction layer. Things that try, like Hibernate, just end up making you write your query in their query language rather than SQL. I prefer to use SQL for that.
I think people who try to use Class::DBI and friends to isolate them from SQL are chasing the wrong goal. I don't need isolation from SQL -- SQL is easy and works great! All I want is for the busywork code that does all the obvious work to be automated for me, which is what Class::DBI is good at.
In your hypothetical situation, writing the custom query is a big win, because Class::DBI will do all the DBI stuff for you, move the data in and out objects for you, allow you to wrap up the whole thing in a single method like Region->customers().

Re:Database Abstractions

tmtm on 2004-08-07T21:14:59
> As systems get larger, we start needing more and
> more ad hoc queries. Writing custom SQL for each
> one becomes problematic

Why is this problematic? If you have to write non-trivial queries to get information out of a database, how else would you expect to be able to do that?

This isn't meant to be a snippy response - I'm truly interested in what other sorts of ways this sort of thing could be done.

Tony

Popping up everywhere

gnat on 2004-08-06T16:43:35

This hasn't gone unnoticed elsewhere. I had dinner with a Java guy at OSCON and he said "I popped my head in on a few Perl talks, but they're very boring. They're discovering all the stuff that Java had years ago, and everyone's reimplementing it in a different way."

--Nat

Re:Popping up everywhere

ziggy on 2004-08-06T17:49:53
That knife cuts both ways. Java 1.5 is getting generics, which have been around since perl1. And don't get me started on inner classes. ;-)
Re:Popping up everywhere

Ovid on 2004-08-06T18:53:32

Ask 'im about regexes. And tell 'em that we don't need no stinkin' autoboxing because we don't do that silly "primitives in an OO world" mess :)

Re:Popping up everywhere

perrin on 2004-08-06T20:02:32
What he doesn't realize is that Perl has had them just as long. Tangram has been around for several years. What he is picking up on though is that Perl developers have traditionally been very skeptical about frameworks and abstractions like this, while Java developers embrace them. Java has paid for it too, with the resounding failure known as EJB. It was so bad that the upcoming EJB 3 spec basically scraps the old stuff in favor of a model copied from the (fairly recent) open source tool Hibernate. Hibernate is closer to Class::DBI than to EJB, so maybe we are converging on a middle ground where abstraction meets usability.

Re:Popping up everywhere

ziggy on 2004-08-06T22:50:20
That's actually quite interesting. Nat and I had a discussion about these issues about 4 years ago, from the opposite side of the fence.
Given that Perl hackers avoid huge bloated object frameworks, either something is wrong with us, or something is wrong with those frameworks.
If Class::DBI and Hibernate are converging on a similar set of goals, and if EJB / AWT / Swing are being rejected by Java programmers en masse, then I'd say we really were onto something all those years ago.

Re:Popping up everywhere

tmtm on 2004-08-07T21:19:41
> If Class::DBI and Hibernate are converging on a
> similar set of goals...

I've never encountered Hibernate before. I just had a look. It scared me.

I can live with having SQL mixed with Perl. XML is just a step too far :)

I have made a few notes on interesting ideas that might be worth stealing for Class::DBI though.

Tony

Re:Popping up everywhere

chromatic on 2004-08-07T04:47:49

I like to think of JCP as a herd of elephant real estate speculators staking out claims in the ever-expanding frontier of The Core Libraries. No other image really captures the elegance, precision, simplicity, and unashamed piles of excreta in their wake.

For fun, go to the JavaOne exhibit hall, yell out "I have a new API for controlling coconut-throwing monkeys", step back, and enjoy the stampede. If the 18-month standardization process doesn't produce a library that includes some sort of rocketry, I'll be disappointed.