Never Let Them Read From Your Database

Ovid on 2009-06-30T11:48:19

An imaginary conversation synthesized from past discussions and the responses I wish I made.

Customer: We need read-only access to your database.
Ovid: No.
Customer: Please?
Ovid: No.
Customer: But I need ad-hoc queries.
Ovid: Your ad-hoc cartesian join returning 12 billion rows was real fun.
Customer: I promise I won't do it again.
Ovid: That's what you said about the ad-hoc cartesian join returning 10 billion rows.
Customer: But this time I mean it.
Ovid: So do I. We guarantee backwards-compatibility in our API, not our database. If we move a field from one table to another, your queries will break.
Customer: Then tell us when you do that.
Ovid: We did that with another team and had to keep delaying releases while they updated their system.
Customer: Then you can provide views to maintain backwards-compatibility.
Ovid: We do that already. "View" as in "Model-View-Controller". It's part of our REST API; you should check it out.
Customer: But your REST API doesn't provide all the information I need.
Ovid: It provides more than the information you need because much of it represents knowledge not stored in the database. If you need more information, let's see what we can do to add this to the API.
Customer: Why are you being so difficult?
Ovid: Because your temporary convenience is not more important than my long-term pain.

Don't let external customers read directly from your database. Just don't. The usual justification is the need to support ad-hoc queries. Get a few samples and try to figure out a general mechanism to support their actual business needs. If you let them read from your database, they will become dependent on this and beg you to hold off database changes or complain if you don't. As your project grows larger, the pain grows more severe. They will have the best of intentions, but good intentions mean nothing when you need to coordinate your internals with people who should know better than to violate encapsulation.

As a side note, ad-hoc queries, even if not causing performance issues, could potentially be dangerous if the people making them aren't really thinking them through. The problem is two-fold. One, they might not be really paying attention to their core business needs (this is subtle and hard to explain, but common). The other problem is that they might very well be making a query that your API already supports, but because they don't rely as much on your API, they don't know it.

Give them their own copy

Ed Avis on 2009-06-30T12:11:26

Why not give them a dump of your whole database and let them load it on their own server? It won't keep up to date, but if ad hoc queries are all they need then that won't matter.

Reinventing the wheel by implementing your own home-grown query language instead of SQL may make sense in some cases, but it's not necessarily the best way.

Re:Give them their own copy

Ovid on 2009-06-30T12:30:05

Thought about that, but they need live data. Our data changes rapidly and being even one day out of date is like playing the stock market by reading a day old newspaper (well, ok, not quite that severe :). It would be good to have a series of read-only slave servers, but that still puts us in the position of them insisting that we can't make that important database change just yet. We've had that happen enough times that we have nasty hacks in our code and database to work around these issues.

Re:Give them their own copy

Ed Avis on 2009-06-30T16:09:33
How about an interface that lets them submit arbitrary SQL queries, but checks them against a whitelist first. So for example your customer might say 'we need to SELECT COUNT(*) FROM FOO' and you would say 'that seems fine, I will add it to the list'. The next day they ask for 'SELECT FRED FROM BAR' and you decide no, the FRED column is an implementation detail I don't want to support forever, so I will not allow them to make that query. That way you have control over what's happening.
If they want a particular query, it's then your call whether to permit it, do the work to add it to your RESTful interface instead, or pick some compromise like making a view for them to use. Or, indeed, deny the request. This gives you more options than allowing or disallowing SQL queries as a whole.
If you want to be especially evil, the SQL gateway can have a mortality rule so that ad-hoc queries are allowed only for one week after they're added, and after that automatically disabled unless re-requested. This could sometimes be better than adding a new documented interface to your API just for a very temporary need.

Oracle?

djberg96 on 2009-06-30T12:41:59

At Qwest we had a snapshot of the production database for people to run reports off of. It had a few limitations (you couldn't snap some of the larger field types), but it worked pretty well.

Also, with Oracle you can set IO limits on a per account basis, so queries like the one you mention would simply timeout after a while.

Does MySQL provide such features?

Are You Talking about Databases?

chromatic on 2009-06-30T15:56:31

... good intentions mean nothing when you need to coordinate your internals with people who should know better than to violate encapsulation.

Perhaps Perl 5 should have a REST API for writing extensions.

Re:Are You Talking about Databases?

audreyt on 2009-06-30T21:47:23
Perhaps something like GHC Plugins? :-)

Counter-point

stinkingpig on 2009-06-30T23:18:28

I work on a project where the API is three years out of date and has gone through five licenses in those three years. Direct database access wins, sorry.

Re:Counter-point

Ovid on 2009-07-01T07:39:41

I don't know what "three years out of date" means in this context, but if you mean that neither management nor developers have bothered to address customer concerns for three years, than there are far larger problems than direct database access. If you meant something else, than I guess I can't respond to that :)
Nor do I know what five licenses has to do with the situation.

Counter-point

stinkingpig on 2009-07-01T13:09:54

You're telling me. Is that different than the norm? I find it to be very normal, as meeting all customer needs in a given release cycle is impractical.

The 5 licenses are important for accessing the API. Under some of them, the customer cannot use it without paying, while under others, they can. If my code used the API instead of the database, I'd have to rewrite during those changes. I didn't even mention the second (also incomplete) API that's struggling for relevance at Layer 8/9.

I suspect you're talking about SaaS though, since you even consider the possibility of a database that customers can't access. That's nice for the leading edge, but it's awfully early for use of "never".