Triggers to track database changes

Ovid on 2008-09-11T16:00:10

Yesterday I wrote how we're using triggers to track changes to our database. Now that it's implemented, our test suite is between one quarter to one third faster. Still too slow, but better than it was.

I've combined yesterday's code with the following:

sub setup {
    my ($self) = @_;
    my $dbh = $self->dbh;

    # XXX this may ultimately prove to be too simplistic
    eval { $dbh->do('select * from changed_table') };

    my $error = $@ or return;
    if ($error !~ /Table.*changed_table' doesn't exist/) {
        croak("Fatal error: $error");
    }

    $self->_rebuild_test_database;
    return $self;
}

sub _rebuild_test_database {
    my $self = shift;

    $self->_create_change_table;
    my $dbh = $self->dbh;
    my @static_tables  = $self->static_tables;
    my @dynamic_tables = $self->dynamic_tables;

    $self->_disable_foreign_keys;
    foreach my $table (@static_tables, @dynamic_tables) {
        $dbh->do("TRUNCATE TABLE $table");
    }
    $self->_enable_foreign_keys;

    # load this *before* the triggers, thank you
    my $reference_data = 'insert_reference_data.sql';
    Pips3::DB::Create->new->load_sql_db_file($reference_data);

    # now make thebackups
    foreach my $table (@static_tables) {
        $dbh->do("CREATE TABLE backup_$table SELECT * FROM $table");
    }

    $self->_add_triggers_and_records;
    return $self;
}

The initial check is simplistic (do I have a changed_table table?). If it's not there, this is a freshly rebuilt database and we do the following:

  1. Create said table
  2. Truncate all tables
  3. Load our "static" data (core data the system requires, such as names of broadcasters)
  4. Back up our static data with the "CREATE ... SELECT" statements.
  5. Ddd records to the changed_table table to help us track which tables have changed
  6. Add the triggers which track the changes

Then, at the beginning of any test which touches the database, it pulls are table names which have changed, truncates them, copies their backed up values (if they're "static" data -- see point 4 above) back to the table and resets their changed_table values.

Aside from a very nice performance boost to our test suite by not having to rebuild everything blindly, it's also interesting to run a test and look to see which tables have changed. I was surprised to see cases where some tests would make changes to tables which seem completely unrelated to the test. Ordinarily that can be very difficult to know.

As a side note, even though I'm the one who implemented this, I believe the original idea belong to Rufus, one of our other developers.


transactions

hdp on 2008-09-11T20:00:00

I may have missed a point where you said you couldn't do this, and if so, I apologize, but: can you wrap every test file in a transaction and then rollback after each test? That's the best speedup I've found for this kind of thing.

Re:transactions

jplindstrom on 2008-09-12T12:29:29

That works for successful cases, because begin/commit nest properly. But begin/rollback doesn't. Rollback rolls back everything.

There's a hack/workaround for that though. It's possible to implement properly nested transactions using savepoints. I think I'll do that at some point on a gold-card day.

Re:transactions

Ovid on 2008-09-12T12:37:40

I think that's very bad idea which, unfortunately, has gained some tractions with testers. The major error is that it says "hey, we're going to alter our code's behavior". Now you are no longer testing your code. You're testing a globally modified version that simply doesn't behave like your real code. It's almost like developers using SQLite to test their code which relies on MySQL -- you're no longer testing your code, you're testing a simulacrum.

What if want to test commit/rollback? What if you have multiple processes that need to communicate? (One process adds data and another one, in a Web spider, fails to read the uncommitted data because it's in another process)? I accept that sometimes we need changes in code behavior to account for tests, but only if those changes are minimal in scope.

Re:transactions

hdp on 2008-09-12T14:38:59

Now you are no longer testing your code. You're testing a globally modified version that simply doesn't behave like your real code.

This is only true if your real code involves using more than one connection to the same database. Lots of code doesn't need to do that in a testing context.

What if want to test commit/rollback?

The previous comment already addressed that, unless you're stuck on a database without them. (I've only done this on postgres.)

What if you have multiple processes that need to communicate? (One process adds data and another one, in a Web spider, fails to read the uncommitted data because it's in another process)? I accept that sometimes we need changes in code behavior to account for tests, but only if those changes are minimal in scope.

This seems like an argument for not using this technique all the time, not like an argument for never using it.

It sounds like you're saying "this won't work for my case, so it's always a very bad idea".

Re:transactions

Ovid on 2008-09-12T14:54:53

There may very well be cases for which this is appropriate so I would not tell someone else "never", but this is one of those things that I've been bitten with very hard in the past. Accidental commits. Accidentally having more than once connection (very common with Web tests). Not being able to properly test bits of the code which commit, etc. If it works for you and you're comfortable with your results, that's fine. It's hurt me far more than it's helped me, so I just prefer not to use this technique. Resetting the database before each test has worked wonders for me, so I'm happy to stick with this :)

Re:transactions

hdp on 2008-09-12T19:09:25

Accidental commits. Accidentally having more than once connection (very common with Web tests).

Yeah, there are definitely fiddly bits, and a lot of times it's not worth the effort.

Not being able to properly test bits of the code which commit, etc.

I don't understand why you keep saying this. Do you not believe me and jplindstrom when we say it's possible to test commit/rollback using savepoints to do nested transactions?

Re:transactions

Ovid on 2008-09-12T19:50:40

To be fair, I've never worked with nested transactions before. I am aware of them and what they can do, but I've never tried it out. I'm looking forward to seeing the results of his "gold card" work with this (we're on the same team at the BBC so I'll be able to see it firsthand).