MySQL Bug of the Day

Ovid on 2008-02-28T13:39:12

One thing which struck me as odd when I started working here is that we used moderately long strings, called 'pids' as primary keys on many of our tables. These keys are char(8) and encoded as utf8. This makes the key length longer than it needs to be and longer primary keys can cause performance issues (which is why you rarely see UUIDs being used as primary keys).

When I asked about this, I was informed about a "fun" issue with MySQL. Apparently, when using InnoDB (we have MySQL version 5.0.45), AUTO_INCREMENT keys lock the entire table and we were getting serious deadlock issues. Allegedly this is fixed in 5.1. The rationale listed?

[Table level locks are] necessary with the current logical binlogging of MySQL, since the auto-inc values assigned in a single SQL statement must be consecutive.

I don't understand this. From an application standpoint, auto-inc keys should be non-identifying and frankly, it shouldn't matter if they're consecutive, ordered, random, whatever, so long as they're unique. I suspect that this is done for internal architectural reasons because I can't imagine that even the MySQL developers think that it's logically necessary that these keys be consecutive.

The counter-argument I envision is someone telling me that you want to know what order the records were inserted in and gaps will give you an idea of what you might be deleting. If that's the counter-argument, I would suggest that you explicitly plan for this and not rely allow an internal implementation detail. That's encapsulation violation.


Yes it could be better... but a bug?

mpeters on 2008-02-28T14:07:47

Apparently, when using InnoDB (we have MySQL version 5.0.45), AUTO_INCREMENT keys lock the entire table
This is only a problem when you are doing lots of inserts in one SQL statement. As they say later on, it's a performance reason (to avoid calling ::get_auto_increment() for each row) not something they think is logically required.

From an application standpoint, auto-inc keys should be non-identifying
Why? A unique number is a pretty good way to identify something and auto-increment keys should always be unique. It doesn't mean it's the best key. Somethings have pretty good natural keys, but even those are usually text and won't have the speed of an integer.

frankly, it shouldn't matter if they're consecutive
I agree that with-in a multi-insert statement, the various insert parts don't need to be consecutive. But if auto-increment keys weren't consecutive I'd be worried. "Increment" implies that the values are consecutive. I'd hate for $i++ to increment my value to some non-consecutive number :)

Re:Yes it could be better... but a bug?

Ovid on 2008-02-28T14:38:10

Non-identifying: since you aren't sure what these values are, you shouldn't use them as consistently identifying a record. They get used internally to join tables and externally when identifying a particular result in a result set when you need to refer to it again, but not identifying the exact data. Delete a record (on an auto-inc table) and later re-insert it. You should get a different auto-inc primary key, but it's the same data, right? The auto-inc PK is non-identifying (perhaps we're just using the term in different ways).

Consecutive: Start with auto-increment = 1 and insert three rows. Then delete where id = 2. The ids, though ordered, are no longer consecutive. There's nothing wrong with the data and there's no logical requirement for these ids to be ordered.

Bug: OK, you got me on this one. It's not a bug, but it's a severe implementation limitation which is frustrating as hell :)

Re:Yes it could be better... but a bug?

chromatic on 2008-02-28T19:09:59

I wonder if it's for write clustering. Each slave gets a step value by which to increment all auto-increment rows. I don't know if the clustering system guarantees an order of operation of writes though.

sequences

perrin on 2008-03-07T17:52:30

Presumably the AUTO_INCREMENT keys are sequential because people use them just like sequences.

There's some interesting stats on performance of char columns as keys here.