the limits of object-relational mappers

perrin on 2007-08-20T20:03:20

At Plus Three, we built a large project using Class::DBI. When we started the project, Class::DBI was the ORM that best met our needs. I applied some patches from the mailing list and added a couple of CPAN modules and some custom code in order to get these features:

- LIMIT support for MySQL on all search queries. - Ability to retrieve all records from one table with a sort ("retrieve_all_sorted"). - Ability to run any search query as a count instead of returning records. - A safe version of find_and_create. The existing one was not atomic.

Surprisingly, after these enhancements, Class::DBI 0.96 proved up to the task for the entire project.

Since then Rose::DB::Object and DBIx::Class have matured, and some other interesting things have come along. These have much more complete querying abilities than Class::DBI. They generate multi-table joins with no trouble. They can fetch related objects in one query in order to avoid multiple trips to the db.

The obvious question is, how much different would the code be if it had been written with one of these instead. And the answer? It would be reduced, but not as much as you might imagine.

See, the great thing about Class::DBI is that it's very easy to add custom SQL to your classes. You just set up a SQL statement that returns the fields you want from the current class and give it a name and you have a custom search. You can even generate SQL programmatically at run-time and use it to select objects. It is limited by the fact that it can only result in a list of objects from one class, but in practice that is rarely an issue.

We used this feature extensively. We had a complex, normalized db, and many carefully tuned SQL queries. Looking at the SQL we wrote, I would guess that about half of it is relatively simple JOIN and LEFT JOIN queries that would be eliminated (or automated) by a more capable ORM. The rest though is beyond the capabilities of existing ORMs.

What's in it? Subqueries, both as derived tables (in the FROM clause) and as NOT EXISTS queries. Transaction control, with SELECT...FOR UPDATE. Database-specific extensions like INTERVAL and GROUP_CONCAT. Import/export commands like LOAD DATA INFILE. UPDATE statements that use joins. Temporary table creation. Not all applications require this level of sophistication with SQL, but I suspect all of the ones with large amounts of data and moderately complex schemas do.

In a few cases, an ORM could get the same results by writing the query in a simpler way. However, performance would suffer. You can get on your soapbox and cite some C.J. Date stuff you read about the relational model and how the phrasing of the query shouldn't matter, but in the real world it matters a great deal. And this is as true with Oracle as it with MySQL.

Could ORMs learn this? They could probably learn some of it. They could expand their coverage of SQL to include many of these operations. They could embed some common wisdom about how to optimize certain types of queries for one database or another, although this would still not always be correct.

In the end though, what's the point of an ORM that has all the complexity of SQL? It doesn't gain you anything unless it makes things simpler, which means it has to ignore a large amount of the capabilities of SQL. An ORM is mostly about making the easy work of simple fetches and saves as automated as possible, not about creating an impenetrable shield between your programmers and SQL.

What all this means to me in practical terms is that an easy way to use custom SQL is the most important feature to look for in any ORM. With a simplistic one like Class::DBI you have to go to custom SQL too soon, but even with a more powerful one you will eventually have to go there.