Find the "SELECT *" Bug

Ovid on 2006-10-03T09:19:11

I get really tired of seeing SELECT * FROM foo. Who knows the hell you're getting? It's real fun having to get into MySQL every time and figure out what's in the table. SELECT * is for lazy programmers -- and not a good lazy. By explicitly naming the fields we get a host of benefits.

  1. It's immediately obvious to the maintenance programmer exactly what is being requested.
  2. It's more efficient than selecting a bunch of columns you don't need, if you have more columns in the table than you're actually using.
  3. If you're selecting into an array you don't have to worry about columns in the database being reordered.
  4. If you're selecting into a hash, you'll find out at the SELECT statement that's something wrong rather than a mysterious bug which may be far removed from the cause.

So why do I bring this up? Because of the following bug caused by SELECT *.

my $thing = $dbh->selectrow_hashref(q[
    SELECT    *
    FROM      things
    LEFT JOIN dead_things USING (thing)
    WHERE     things.thing = ?
], undef, $thing_num) or die "$thing_num not found\n";

See the bug? Had the fields been explicitly named instead of using an asterisk, this bug would not have occurred.


USING is bad anyway

Aristotle on 2006-10-03T11:13:55

I prefer to name my primary key column `id` and use `foo_id` as the name for foreign key columns. Then I use `ON` explicitly.

LEFT JOIN dead_things ON thing.id = dead_things.thing_id

Overall the query becomes more verbose, but I prefer it that way. It makes my schemata much more easily readable when columns immediately reveal themselves for what they are by their name.

Re:USING is bad anyway

Ovid on 2006-10-03T11:18:45

That still doesn't fix the bug :)

Re:USING is bad anyway

jdavidb on 2006-10-03T12:56:48

I don't actually see the bug, yet, but I'm still a little groggy this AM. I do agree with your principle. But you may have to help me along to see this specific bug. :)

Re:USING is bad anyway

Ovid on 2006-10-03T13:12:42

Major Hint: what are we joining on and what data structure is Perl using?

Re:USING is bad anyway

Ovid on 2006-10-03T13:20:03

You know, this might be a MySQL-specific bug. I don't have other databases handy on which to test this. I tried it in SQLite and the problem doesn't appear:

~ $ sqlite3 foo
SQLite version 3.2.8
Enter ".help" for instructions
sqlite> create table one (server, name);
sqlite> create table two (server, value);
sqlite> insert into one values(1,'bob');
sqlite> insert into one values(2,'ovid');
sqlite> insert into two values(1,'one');
sqlite> select * from one left join two using(server);
1|bob|one
2|ovid|

With MySQL, that SELECT statement was returning data more like this:

1|bob|1|one

As a result, because it was a left join, the duplicated join value was sometimes null and when shoved into a hash, overwrote the original value. However, USING doesn't have this problem in SQLite, but it does with ON:

sqlite> select * from one left join two on one.server = two.server;
1|bob|1|one
2|ovid||
sqlite>

Re:USING is bad anyway

jdavidb on 2006-10-03T15:00:35

I think it's definitely MySQL specific ... it sounds like MySQL doesn't actually know how to do that type of join properly?

But I'm confused about something else ... I thought selectrow_hashref should give you a hash where the keys are the fieldnames and the values are the values in the record. You seem to be indicating a problem where one of the values is NULL and that gets used as a key in the hash? But I can't see how that would happen, unless I'm misunderstanding the expected results of selectrow_hashref (I'm more used to fetchrow_hashref, which I presume does basically the same thing).

Re:USING is bad anyway

jdavidb on 2006-10-03T17:24:18

You seem to be indicating a problem where one of the values is NULL and that gets used as a key in the hash?

No, I misunderstood. Never mind. :)

Re:USING is bad anyway

Aristotle on 2006-10-03T21:21:22

I’m saying that you got bitten this easily because you combined two bad ideas.

Although now that I think of it, the naming scheme I use means that any join would lead to duplicate id columns.

Good thing I avoid SELECT * like the pest, then. Hmm, looking at some of my own source code, I find I do use the star quite a lot in the older parts. However, it’s always qualified with a table name, which explains why I never ran into your bug.

Bug depends on the DB (I think)

Matts on 2006-10-03T13:18:35

Some DBs would return that as "Table.Column", unless it's DBI always turning it into just "Column".

I agree on the principle though - SELECT * should probably only be used in the SQL-Shell for testing stuff out.

Re:Bug depends on the DB (I think)

jdavidb on 2006-10-03T15:01:52

Ah, I can see how that would be a problem, if you wound up with two fields in different tables with the same name. But I'm used to the syntax of NATURAL JOIN and didn't look up the syntax of LEFT JOIN to see if it removes the duplicated column from the output or not.

"SELECT *" encouraged in some environments

runrig on 2006-10-03T21:49:44

In the 4gl I used, when you had to explicitly declare variables, you could declare a record 'like' a table, then 'select *' into the record. This was extremely convenient on tables with many columns. Unfortunately, that meant that you had to recompile the program if the table changed.

And Matt is correct in that some databases return 'table.column' as the column name (as the hash key or in $sth->{NAME}), which is annoying when switching between databases.