the limits of object-relational mappers

brian_d_foy on 2007-08-20T11:26:00

At Plus Three, we built a large project using Class::DBI. When we started the project, Class::DBI was the ORM that best met our needs. I applied some patches from the mailing list and added a couple of CPAN modules and some custom code in order to get these features:

- LIMIT support for MySQL on all search queries.
- Ability to retrieve all records from one table with a sort ("retrieve_all_sorted").
- Ability to run any search query as a count instead of returning records.
- A safe version of find_and_create. The existing one was not atomic.

Surprisingly, after these enhancements, Class::DBI 0.96 proved up to the task for the entire project.


Or just avoid them...

Matts on 2007-08-20T21:01:03

I just don't find SQL that hard, and if you create a neatly packaged data access layer it doesn't get in the way of building your application too much, plus you get all the benefits of writing your own code, without the frustration of having to figure out how to do that stuff in an ORM.

Re:Or just avoid them...

perrin on 2007-08-20T21:09:24

Theoretically, a good ORM provides a neatly packaged data access layer. It can certainly get in your way though if you're trying to do things that are mostly reports and don't have any use for turning the results into objects.

Re:Or just avoid them...

sigzero on 2007-08-20T22:46:54

I tend to put my SQL into a package and call it from that. Neat and tidy and no SQL intermixed with my code. I am sure there are better ways of doing it and I might like some of the ORM stuff but at this point in time I haven't missed it.

Re:Or just avoid them...

Matts on 2007-08-21T02:06:40

That's exactly what I do. I've built some pretty high performance apps that way (like in the order of 10m hits/day) and it's worked well for me.

Great Points

Ovid on 2007-08-20T21:08:31

When people talk about ORMs, I think they're generally looking at them the wrong way. The more I work with them, the more I think that trying to marry an class and a table is a terrible idea. I recently worked on a system where I could do something like this:

$server->disable;

That mapped to a table representing dedicated server, but that class had several other ORM classes it needed to interact with. That particular method hid tons of complexity behind it, including logging, deallocating IP addresses, interacting with Nagios, updating billing information and plenty of other things. There was no way you could think of it as a class mapping to a table, even though it had one main table for the bulk of its data.

I now think the best way of using an ORM is to not hide the fact that you're dealing with a database and allow the programmers building the classes to merely think of the ORM layer as merely an object interface to the database. The code which uses the classes built on the ORM should never see a database, but so long as we don't think of the ORM as being the model, but rather as a persistence layer underneath the model, we can focus on creating a clean interface for people and merely have the model delegate behavior to the ORM as needed. The controller and view would generally not need to interact with the ORM at all (except where there's a direct delegation of model behavior to ORM behavior).

This gives the developers plenty of freedom. Plus, when you find it easier to drop down to SQL, you still have a clean model hiding that. In this view, model classes don't inherit from the ORM, but merely delegate responsibilities as appropriate.

I think this fits very well with what you're saying. If an ORM tries to do too much, it gets overly complex and difficult to maintain and work with. By accepting the limits of this way of working, we can let the ORM do what it does best but keep the hard customization bits in the model.

Re:Great Points

Alias on 2007-08-20T23:19:16

I disagree, kind of.

I agree that a direct marrying of a class to a table can be a bad idea.

I disagree that the abstraction cannot be a positive thing.

My non-CPAN ORM ignores the class to table link and models the data at an entity-relationship level.

From there, it derives BOTH the classes and the tables.

Some of those classes don't have a table ("Transient Entities", imagine doing FormMail with all the structure and support of an ORM, but not ever needing to actually store the data) or don't write (I have "View Entities", which are just a view).

Many-To-Many tables don't have classes at all, they are used directly by the many-cardinality relations of the classes on each side.

And so on and so forth.

And surprisingly, the model holds up to scrutiny and stress amazingly well.

You get all the normal Class::DBI magic, but you also can drop back to a more basic usage and generate raw SQL. Or hell, just add a View Entity and build a view into the database if you need something really curly.

The problem is not so much that the abstraction is failing, it's that the abstraction is being done too naively. They aren't abstracting the right thing.

Re:Great Points

Aristotle on 2007-08-21T09:01:15

That sounds very much like how I think of DBIx::Class. Have you looked at it? I don’t think of DBIC as an ORM so much as I think of it as an OO API for SQL. Its basic unit is the result set – in other words, an OO wrapper around an SQL query. The main win for me is that I can accumulate conditions into queries (including composing result sets into one another) and let DBIC build the resulting SQL for me, instead of having to manually write template-ish SQL-generating logic for highly parametrisable queries.

I don’t use it for everything though. DBIx::Simple + SQL::Interp make a very fine pair for pain-reduced manual SQL writing.

Re:Great Points

Ovid on 2007-08-21T09:28:29

I've looked at DBIx::Class and it looks very nice, but I chose Rose::DB::Object. The project in question was an attempt to apply some discipline to a system where the code was rather typical of "legacy" systems. It was basically a bunch of CGIs and some of it already was a tad slow. One of our developers already had some performance issues with DBIx::Class on another project and I read some experiences others had with its performance. Since Rose is known for how incredibly fast it is (and I love how it separates the management of instances from the management of groups of instances), I felt that was a good way to go.

In retrospect, Rose may not have been the best choice since transactions are not supported "out of the box" because database handles are not shared. John Siracusa explains his reasons for this, but it has been enough of a stumbling block that the team using my code is pulling transaction support until they have more of the system "under control".

Re:Great Points

perrin on 2007-08-21T15:52:23

The main win for me is that I can accumulate conditions into queries (including composing result sets into one another) and let DBIC build the resulting SQL for me, instead of having to manually write template-ish SQL-generating logic for highly parametrisable queries.

Any chance you could show an example? This is frequently cited as an interesting feature of DBIC, but there's not much in the docs about it. How is accumulating conditions and letting it build the SQL different from what the other SQL builders like SQL::Abstract or the one in Rose::DB::Object allow?

I agree that SQL::Interpolate is a pretty interesting alternative when an ORM seems like a bad fit.

Re:Great Points

Aristotle on 2007-08-26T21:52:32

It’s sort of a higher-order SQL.

I can pass a ResultSet to someone else, and they can add constraints, joins, a group clause or such, as they like. This makes it much easier to decouple (and sometimes reuse) code. F.ex. I can build a ResultSet piecemeal along a chain of controllers in URI dispatch. I can even change some aspect of the query from within a template, eg. add a LIMIT clause to a passed-in ResultSet, without automatically having to tie the template to that particular query.

Such a thing is possible with SQL::Abstract by passing around the nested data structure for the query before running it, letting everyone inject bits and bobs in the right places. However, it’s just a passive data structure. The code manipulating the data structure will either only be able to manipulate a particular structure (so it’s tightly coupled to upstream) or be need to interpret it like SQL::Abstract does (leaking complexity and duplicating code). I much prefer writing template-ish code to build SQL strings piecemeal. (In a sense, that is what SQL::Interp is all about.)

I don’t know about RDBO, but to my knowledge, it doesn’t have the ResultSet as an explicit concept the way DBIC does. And I know that with Class::DBI, which also doesn’t have the concept of a ResultSet, this is kind of piecemeal query composition is not possible at all.

Re:Great Points

perrin on 2007-08-26T22:44:05

Thanks, that's the best explanation I've heard of it.

I've mostly used the data structure approach. RDBO includes a query builder which is much more powerful than SQL::Abstract and is able to represent joins and complex conditions reasonably well. I've done work with that where the conditions are built up similar to what you describe and then passed to the query builder to be turned into SQL.

here, here!

mpeters on 2007-08-20T22:25:40

Class::DBI does get on my nerves at times, but the thing I absolutely love about it is how easy it is to drop into SQL and back out again. Any SELECT query can return C::DBI based objects. And most of the other annoyances you took care of for us with your patches :)

What ORMs are good at

merlyn on 2007-08-21T00:37:32

What I really want an ORM for is to take care of the mind-numbing SQL. The stuff you write repeatedly to "get a row with this primary key so I can look at those four columns". That's going to be 80% of your SQL coding in a moderate-sized app.

But I agree with you... the other 20% can be split into "stuff that can be represented with convoluted Perl" vs "stuff that is better represented as real SQL". The line between those will vary from app to app, but the important thing is that that last category be relatively integrated with the ORM, so you don't go entirely into a different mode.

I'm in the opposite boat

btilly on 2007-08-21T22:52:48

My job is reporting. So while most Perl programmers seem to be hurrying as fast as they can to keep from having to write SQL, I'm going the other way - I don't want the necessary bits of Perl to obscure the big chunks of SQL.

When it makes sense, I'm happy to recommend an ORM. Most transactional applications fall in that boat. But when I see convoluted APIs to try to build complex SQL in Perl, I always am left thinking, "Why should I learn an API that is as complicated as SQL to get less capability than raw SQL already gives me?"

Furthermore if you're working with databases, I really think that you need to learn SQL. Learn what a database is, and how it works. That is knowledge that will repay you time and time again.

Incidentally at a previous job we found that an excellent interview question was to outline a simple real-world problem, and ask the candidates to design a database schema to handle it. An example would be to design a database for football teams to develop reports to help our boss play fantasy football. An example of a report would be which players scored the most touchdowns in 2005.

We saw this as a litmus test for design skills. If you're building a CRUD application (we were), laying out your database is a large part of your application design. Whether or not you're accessing it through raw SQL (we weren't), there is a direct relationship between how the database is laid out and what is easy to do in the application. Being able to see and appreciate those relationships is an important skill.

Plus virtually every solution had issues. Which gave us an opportunity to see how a person could evolve their design when problems were pointed out in it. Seeing that thinking process was invaluable.

But if you've spent your entire programming career running away from learning what a database is and how to use it, how would you do on a problem like that? (And how much worse do you do on the real-world programming problems that this problem is meant to emulate?)