An imaginary conversation synthesized from past discussions and the responses I wish I made.
Don't let external customers read directly from your database. Just don't. The usual justification is the need to support ad-hoc queries. Get a few samples and try to figure out a general mechanism to support their actual business needs. If you let them read from your database, they will become dependent on this and beg you to hold off database changes or complain if you don't. As your project grows larger, the pain grows more severe. They will have the best of intentions, but good intentions mean nothing when you need to coordinate your internals with people who should know better than to violate encapsulation.
As a side note, ad-hoc queries, even if not causing performance issues, could potentially be dangerous if the people making them aren't really thinking them through. The problem is two-fold. One, they might not be really paying attention to their core business needs (this is subtle and hard to explain, but common). The other problem is that they might very well be making a query that your API already supports, but because they don't rely as much on your API, they don't know it.
Re:Give them their own copy
Ovid on 2009-06-30T12:30:05
Thought about that, but they need live data. Our data changes rapidly and being even one day out of date is like playing the stock market by reading a day old newspaper (well, ok, not quite that severe
:). It would be good to have a series of read-only slave servers, but that still puts us in the position of them insisting that we can't make that important database change just yet. We've had that happen enough times that we have nasty hacks in our code and database to work around these issues. Re:Give them their own copy
Ed Avis on 2009-06-30T16:09:33
How about an interface that lets them submit arbitrary SQL queries, but checks them against a whitelist first. So for example your customer might say 'we need to SELECT COUNT(*) FROM FOO' and you would say 'that seems fine, I will add it to the list'. The next day they ask for 'SELECT FRED FROM BAR' and you decide no, the FRED column is an implementation detail I don't want to support forever, so I will not allow them to make that query. That way you have control over what's happening.If they want a particular query, it's then your call whether to permit it, do the work to add it to your RESTful interface instead, or pick some compromise like making a view for them to use. Or, indeed, deny the request. This gives you more options than allowing or disallowing SQL queries as a whole.
If you want to be especially evil, the SQL gateway can have a mortality rule so that ad-hoc queries are allowed only for one week after they're added, and after that automatically disabled unless re-requested. This could sometimes be better than adding a new documented interface to your API just for a very temporary need.
Also, with Oracle you can set IO limits on a per account basis, so queries like the one you mention would simply timeout after a while.
Does MySQL provide such features?
... good intentions mean nothing when you need to coordinate your internals with people who should know better than to violate encapsulation.
Perhaps Perl 5 should have a REST API for writing extensions.
Re:Are You Talking about Databases?
audreyt on 2009-06-30T21:47:23
Perhaps something like GHC Plugins?:-)
Re:Counter-point
Ovid on 2009-07-01T07:39:41
I don't know what "three years out of date" means in this context, but if you mean that neither management nor developers have bothered to address customer concerns for three years, than there are far larger problems than direct database access. If you meant something else, than I guess I can't respond to that
:) Nor do I know what five licenses has to do with the situation.