Let's defeat purpose of transaction processing, shall we?

jdavidb on 2005-12-14T20:21:48

So you load a 10000 record file. On record 9990, an error is detected. Thankfully, Oracle throws an exception, and all your work is rolled back. Otherwise, you'd have a real dickens of a time starting over when you need to reload the same file.

Except that your DBA just called you, as mine did, and said, "You're affecting performance by not committing frequently! You need to commit every 1000 records or so!" That's great, but the purpose of transaction processing is so that all my records can go in as a unit. I want them all to go in at once, or not at all.

This is not the database I regularly work on, nor the DBA I regularly work with. As a matter of fact, the real problem to me seems to be that this database is extraordinarily slow. The regular database I work on would load all 10000 records in a heartbeat and not even shrug. Over there, loading 24 hours worth of data took 24 hours. On this database instance, loading 24 hours worth of data seems to take over 24 hours...

So what do I do? If I commit before the transaction is done, I lose the whole point of transaction processing. Isn't the database supposed to handle this kind of crap on its own? How do I appease the DBA without losing the fundamental point of why I need it to work this way?


Suggestion

cog on 2005-12-14T22:15:02

So what do I do?

Er... can't you split your 10000 records in, say, 10 chunks of 1000 records each, and insert them separately?

I've seen people having the very same problem and solving it that way...

Or am I missing something?

re Let's defeat purpose of transaction processing

n1vux on 2005-12-14T22:27:00

PLAN A - Do what is natural in the business logc. Many a bulk file load does not have transactional nature at the business-logic level -- any one record could be cancelled separately. And sometime the whole file is a transaction to the business too. Some of my feeds it is required we reject the whole file if any record rejects, but others it is required we accept the rest of the records; I would claim the natural "transaction scope" for loading those files is different.

However, business logic / natural and operational efficiency are different. Having each file restartable is quite desirable at an operations level. But having no pending transaction grow too large is a very real operational issue for the DBAs. Your debate with the DBA is strictly in operational space; even if there was a natural business answer, the operational issues might trump it anyway. This is however a solvable conundrum.

PLAN B -- Well, maybe you could use Perl or *nix split(1) to split the input file into subfiles of 1k rec and process them in sequence, marking them each as completed when committed. Then if you need to restart, you can avoid the done ones and pickup at the 1000-line break to which the DB rolled-back.

$ split --lines=1000 inputfile pieces.
$ ls piece*
pieces.aa pieces.ab pieces.ac pieces.ad pieces.ae
...
# now process pieces.?? in sequence, commit each.
#  = pieces.a[a-j] for your 10k records ...
This requires only wrapping your existing job with a splitter/re-sequencer script.

PLAN C -- Alternatively, you include in the transaction an update to a side table or the status table indicating the last file and line processed. Upon restart, that's where you resume. You probably already have a table listing files received with status ... if so, you just need to add "Number of lines accepted" column, and update at each partial-file-commit point, not just at EOF.

This is somewhat more invasive, changing Schema and basic logic. But probably well worth it.

PLAN F -- The obvious answer of having the SQL avoid inserting records already in existence is of course so very very wrong, I hope I don't need to explain why.
(Hint: speed and/or race conditions)

Your instinct of asking the DBA why is DB can't handle a transaction of 10k records when other DBs can is of course a good question to ask. The DBA may have a good answer, though -- if the DBA is any good, they're paid to have a good answer to that question, one we outsiders do not want to have to understand. If the DB in question is optimized for On-Line usage, it may have internal structures that make queueing a huge transaction awkward. Or it may be running on junk hardware or an older rev of the DB or whatever. If there's a real answer, be prepared to fall back to plan B or plan C above (split).

If your business requires you to accept all valid records immediately and re-try only the invalids, you'd not rollback the invalid but commit all 999 good records in one batch and write the invalid 1 record to an exception file that could be reprocessed after repair. This is stored proc uglyness .. or a front end validation program.

__

Abigail on 2005-12-14T23:40:49

Let's see, you have a transaction of a mere 10k rows, and the DBA comes complaining to you? Sound to me the DBA has a problem, not you. It should be you going to him saying that you try to insert 10k rows, and the database is really slow - asking him whether his database is badly configured.

Re:__

jdavidb on 2005-12-15T15:48:23

Thanks. I think in addition to the plans n1vux offered, I needed that validation to know I'm not being completely unreasonable.

I did do some checking after my post and there are some 40K-record files I am loading. But that's not even a difference of an order of magnitude.