What does this "consistent read" statement mean?

Ovid on 2007-09-04T14:03:52

When using InnoDB and MySQL, you can avoid row-level locking when selecting data, thus avoiding a potential performance problem. This is done via a "consistent read". From the latest MySQL docs on consistent reads:

A consistent read means that InnoDB uses multi-versioning to present to a query a snapshot of the database at a point in time. The query sees the changes made by those transactions that committed before that point of time, and no changes made by later or uncommitted transactions.

So far, so good. You get a snapshot of data to read and nothing will be updating it underneath you.

The exception to this rule is that the query sees the changes made by earlier statements within the same transaction.

That makes sense. If I change data in a transaction, a subsequent select in the same transaction had better see my changes.

Here's where I get nervous (emphasis mine):

Note that the exception to the rule causes the following anomaly: if you update some rows in a table, a SELECT will see the latest version of the updated rows, but it might also see older versions of any rows. If other users simultaneously update the same table, the anomaly means that you may see the table in a state that never existed in the database.

What? I've read that several times and I still don't get it. What does "older versions of any rows" mean? And what does "may see ... a state that never existed" mean? This is very cryptic, there are no comments, and this is for the latest version of MySQL. "If other users simultaneously update the same table" causes this issue? In or out of transactions? How can I reproduce this? This sounds like a heavy OLTP type system is more likely to let you read bad data than an OLAP system. In other words, the heavier the data changes (instead of merely adding to it), the more likely you'll get bad results.

This can't be right. Can someone please explain what I'm missing?


I would imagine something like this:

Adrian on 2007-09-04T14:23:43

Fred and Mary start with:

1 A
2 B
3 C

Fred starts a transaction and updates the first two rows so the table that looks like:

1 X
2 Y
3 C

In the mean time Mary has started and finished a transaction that updates the third row:

1 A
2 B
3 Z

which means that Fred is seeing the "old" version of row three, and the table as a whole reflects something that will never have existed in the database.

Make sense?

Re:I would imagine something like this:

Ovid on 2007-09-04T14:43:46

Oh! I replied to Aristotle without thinking too carefully about your reply. I can see how this could occur, but it's problematic because then things aren't atomic. If you take actions based upon this, you can still corrupt your data. You update row 3 because 'C' is there, but it's really not. Now you have a race condition because consistent reads aren't. Or did I misunderstand something?

Re:I would imagine something like this:

Guss on 2007-09-04T15:13:59

You can't talk about atomic operations and transactions at the same time - its kind of contradictory. Atomic operations mean that the database is either in a state before the operation or in a state after the operation, and never in between. But transaction means that you can group several actions and execute them so that the database reflect the changes only after the execution of all actions - the transaction can't be atomic in the sense that from inside it you can see the intermediate state.

In the example above, if after Fred's first change he says something like "if row 3 has C, update it to D" then that operation would succeed and the commit that terminates the transaction would fail - the same way that an atomic update, all by itself, would fail when row 3 no longer has C.

Re:I would imagine something like this:

perrin on 2007-09-04T17:04:45

In most applications, this is allowed because it improves concurrency so much. If you really need to prevent it, you can set the isolation level to SERIALIZABLE, and then additional locks will be used so that you can't modify data that another transaction is currently looking at and potentially preparing to modify.

Re:I would imagine something like this:

btilly on 2007-09-04T15:30:27

Your imagination is quite likely wrong in this instance.

I don't have a MySQL database to play with, but if that was Oracle then by default in your situation, Fred will see a Z in the third row. The reason is that consistent reads are consistent at the statement level, not the transaction level. So you see changes that have been committed after your transaction started, but before your statement began.

The technical reason for this is that with Oracle's implementation, consistent reads have to be done by reading a special data segment called rollback. If you try to do a consistent read with data that has scrolled off of the rollback segment you get a "rollback segment too old" error. Nothing can really be done about that error, and rollback segments generally scroll off pretty quickly on transactional databases. So Oracle chose a default that minimizes how often you get that error.

My understanding is that InnoDB closely mirrors Oracle's implementation, so I wouldn't be surprised if they have the same behaviour.

I should note that your expectation would be correct for PostgreSQL. It does work from the beginning of the transaction until past versions of data are cleaned up with a vacuum command.

I should further note that at one point I knew Oracle's implementation well enough to believe that its guarantee wasn't quite guaranteed. I convinced myself and a couple of DBAs that if, during a statement, one user committed a change to a row, then a second user updated and committed that row again, then you read it, you'd get the first user's version of the row. However I never tested this theory, so my theory may be wrong. (Something about not having sufficient access to an Oracle database, and not considering it important enough to set up a database I did have access to.) I have no opinion on whether InnoDB mirrors Oracle closely enough to mirror this possible flaw. However I'm quite sure that PostgreSQL does not suffer from this potential flaw.

Re:I would imagine something like this:

perrin on 2007-09-04T17:02:12

It depends on the isolation level you're using. The default level for InnoDB is a bit safer than the default for Oracle. It is REPEATABLE READ, which does keep a consistent snapshot from the beginning of the transaction. To get something closer to Oracle's default, you can use READ COMMITTED, which only guarantees consistency on a statement level. Both databases support SERIALIZABLE, which does extra locking and prevents some tricky cases like phantom rows.

I think…

Aristotle on 2007-09-04T14:27:11

a SELECT will see the latest version of the updated rows, but it might also see older versions of any rows.

I think the key is the “any”, and what they are trying to say is that if you start a transaction, you will only see rows as they were at the time of the transaction start, plus newer data in rows you updated since.

If other users simultaneously update the same table, the anomaly means that you may see the table in a state that never existed in the database.

I think what this is trying to say is that you only commit the rows you changed, and by the time you commit them, the old rows you aren’t committing may have changed, so the combination of rows-as-of-start-of-transaction and rows-updated-during-transaction that you see from within the transaction likely never actually exists on disk.

It is very confusingly written, but I’m not sure how it could have been made easier to understand. Human brains are only marginally capable of grasping temporal event flows, and we get really confused when we have to think about timelines in multiple alternate realities. (I think Damian Conway may have recently said some stuff along those lines…)

Re:I think…

Ovid on 2007-09-04T14:39:51

Here's the problem I have:

The exception to this rule is that the query sees the changes made by earlier statements within the same transaction.

So the exception occurs withing a transaction.

If other users simultaneously update the same table, the anomaly means that you may see the table in a state that never existed in the database.

So the exception occurs within a transaction. The anomaly occurs when several uses update the same table. How does that happen within a single transaction?

Re:I think…

Aristotle on 2007-09-04T15:05:38

It doesn’t happen in a single transaction.

You see the updates you made, plus the rows you didn’t touch in whatever state they were when you started. This is within the transaction.

By the time you commit your changes, the rows you didn’t change may differ from what they were when you started the transaction. Obviously, if they differ, this must be because other users updated the table alongside your transaction.

So the particular world that you were seeing from within the transaction (the database state as seen when you started the transaction, plus your updates on top) never actually existed. It was a transactional illusion.

Relevant quote

iburrell on 2007-09-10T23:32:19

I was reading a document comparing differences between PostgreSQL and MySQL. It mentioned the different interpretations of transaction isolation and how some of the MySQL modes are inconsistent and weird.

When comparing the two models, PostgreSQL enforces client separation where the data operated on is always consistent under all circumstances; as the MVCC documentation states, "the reason that PostgreSQL only provides two isolation levels is that this is the only sensible way to map the standard isolation levels to the multiversion concurrency control architecture." MySQL allows configurations where client code that doesn't commit transactions properly can result in a data view that would be considered inconsistent by PostgreSQL's stricter standards. However, in situations where it's acceptable for data being read to have small inconsistencies, being able to use a less strict locking could be a performance advantage for MySQL.
It also some good links to understanding transaction logic.