Stupid NULLs

Ovid on 2006-10-02T14:09:53

They're frickin' counter-intuitive, I tell ya.

In mysql, I just ran the following query:

mysql> SELECT count(some_field) FROM some_table WHERE some_field IS NULL;
+-------------------+
| count(some_field) |
+-------------------+
|                 0 |
+-------------------+
1 row in set (0.00 sec)

Change count(some_field) to count(*) or count(id) and it returns "105".

I don't know if this particular behavior is mysql specific or not, but it's very annoying.


Fun

jk2addict on 2006-10-02T14:38:24

The old ANSI_NULLS what should compares do trick. :-)

In MSSQL, you can flip this sort of thing back and forth with ANSI_NULLS on/off options. Wonder if MySQL has the same sorts of things. It can lead to some unexpected things if you're doing IN subselects during updates/deletes.

Unknown

Dom2 on 2006-10-02T14:55:57

It kind of makes sense if you think of NULL as "unknown". You can't count it because it's unknown...

-Dom

Re:Unknown

dws on 2006-10-02T15:19:34

Agreed here. Pronouncing NULL as "unknown" has kept me out of more than a bit of trouble.

Not sure either

Mr. Muskrat on 2006-10-02T15:18:36

I'm not sure if it's a MySQL specific thing or not but it's the very reason that I still use count(*) regardless of the database I'm using.

Always use count(*)

djberg96 on 2006-10-02T16:12:08

With Oracle at least. It's a misconception that count(foo) is faster than count(*), in case that's the reason you were avoiding it.

Mind you, I can speak for certain with MySQL, but it may be similar.

Re:Always use count(*)

jk2addict on 2006-10-02T17:23:38

Actually, Jay Pipes of MySQL gave a talk this weekend about MySQL perf stuff. One of the slides was about not really using count at all if you don't have too, depending on things like MyISAM/InnoDB.

http://jpipes.com/presentations/mysql_perf_tuning.pdf
pg.16

Offtopic, but he did a good job of presenting things that I wouldn't normally think about as just a DBIC->mysql user, like how the two engines deal with indexes across multiple keys, etc. Probably stuff any good MySQL dbadmin knows and I never think about when I just spit out some queries.

Re:Always use count(*)

djberg96 on 2006-10-02T19:17:47

Interesting, thanks for the link.

That should have been "can't speak for certain", btw. :)

counting NULLs

petdance on 2006-10-02T17:36:28

I believe that what count(X) really does is count the number of X that are non-NULL. That's why you're getting 0.

count()

Abigail on 2006-10-02T19:57:50

count(expression) counts the number of times expression is not NULL. Which explains your answer. I always use 'count(*)' if I want to count the number of rows - several databases have optimized 'count(*)'. Others use 'count(1)' to count rows.

Re:count()

jdavidb on 2006-10-02T20:30:53

The only two incantations of COUNT() I have used that worked the way I expected were COUNT(*) (to count rows) and COUNT(DISTINCT field) (to count distinct values of a field, and even then I'm not sure what it does with NULL). This is with Oracle; with other RDBMSes YMMV. I'm sure there's some magic I could do with COUNT(field) and COUNT(field1, ..., fieldN), but I don't know how it works. I may have done that in Oracle class or database class at the university, but I didn't retain any information on how to get it to work (or even what it's supposed to do).

Re:count()

Ovid on 2006-10-02T21:43:21

Regrettably, this is something that much documentation does not make clear. I'm expecting count() to return the number of instances of a particular value. Given that NULL can be thought of as "unknown", I suppose one could argue that it makes sense that it doesn't count the number of values. However, it also seems reasonable for one to assume that count(some_field) will return how many unknown values there are.

Your explanation is perfectly correct. I'm just frustrated that an arguably "intuitive" answer turns out to be very wrong. I should have remembered that I absolutely must avoid NULLs whenever possible, but in this case, I threw together a quick query to examine a database problem and was mystified that my results were clearly contrary to my expectations.

I'll remember to use count(*) or count(1) in the future, when possible.

Re:count()

Abigail on 2006-10-02T22:16:34

I'm just frustrated that an arguably "intuitive" answer turns out to be very wrong.

That's because your "intuitive" answer isn't the "intuitive" answer of one someone who breathes SQL. Just like what you find "intuitive" in Perl isn't "intuitive" for many people who also use Perl.

I do find the answer "intuitive", but only because I keep myself reminding that a NULL in a relational database is a very special thing, and far more undefined than 'undef' is in Perl. If you try to think of a database 'NULL' as an 'undef' in Perl, or a NULL pointer in C, things rapidly come "unintuitive".

Re:count()

Aristotle on 2006-10-03T03:51:58

Ditto Abigail about intuition. I think the way COUNT works in SQL w.r.t. NULLs is actually useful and helpful.

Your stance about NULLs reminds me about the saying about GOTO and the apprentice, the journeyman and the master. (Not that I can claim to be a master, mind…)

Re:count()

jdavidb on 2006-10-03T13:09:30

I guess some of the strange logic is this: if you want to get a count of the number of unique values, you use COUNT(DISTINCT field). If you want to get a count of records, you use COUNT(*). Now what do you want when you ask for COUNT(field)? Do you want a count of distinct values? Use COUNT(DISTINCT field). Do you want a count of records? Use COUNT(*). If you grabbed field and then counted the number of values (not distinct), you'd get absolutely the same results as COUNT(*), right?

So I guess some of the tortured logic is, "COUNT(field) should do something completely different since the other two effects can already be achieved with different code. What else can we do? Oh, I know! Let's have it return a count of all the NON-NULL values! After all, it's got to have some reason for actually looking at field; otherwise the coder would've just said COUNT(*)."

Weird, but sensible, in the way that any language decision that comes out of a committee is "sensible."

SQL is consistent (almost)

ferreira on 2006-10-03T01:22:39

Ignoring NULLs in COUNT (just as Abigail explained) is just consistent with other aggregate functions in SQL. For example, SUM(foo) sums up every column 'foo' except the null ones. So there's no magical transformation of null to 0 and no concern with exceptions raised because there are some nulls out there. Obviously this behavior is equivalent to converting NULLs to 0, but it is not so everywhere. MAX(foo) and MIN(foo) also work this way: ignore NULLs - which is good because they don't compare just numbers and converting NULLs to 0 could also distort the maxima and minima unexpectedly.

To count rows, you may use COUNT(1) as well.