They're frickin' counter-intuitive, I tell ya.
In mysql, I just ran the following query:
mysql> SELECT count(some_field) FROM some_table WHERE some_field IS NULL; +-------------------+ | count(some_field) | +-------------------+ | 0 | +-------------------+ 1 row in set (0.00 sec)
Change count(some_field) to count(*) or count(id) and it returns "105".
I don't know if this particular behavior is mysql specific or not, but it's very annoying.
It kind of makes sense if you think of NULL as "unknown". You can't count it because it's unknown...
-Dom
Re:Unknown
dws on 2006-10-02T15:19:34
Agreed here. Pronouncing NULL as "unknown" has kept me out of more than a bit of trouble.
Mind you, I can speak for certain with MySQL, but it may be similar.
Re:Always use count(*)
jk2addict on 2006-10-02T17:23:38
Actually, Jay Pipes of MySQL gave a talk this weekend about MySQL perf stuff. One of the slides was about not really using count at all if you don't have too, depending on things like MyISAM/InnoDB.
http://jpipes.com/presentations/mysql_perf_tuning.pdf
pg.16
Offtopic, but he did a good job of presenting things that I wouldn't normally think about as just a DBIC->mysql user, like how the two engines deal with indexes across multiple keys, etc. Probably stuff any good MySQL dbadmin knows and I never think about when I just spit out some queries.Re:Always use count(*)
djberg96 on 2006-10-02T19:17:47
Interesting, thanks for the link.That should have been "can't speak for certain", btw.
:)
Re:count()
jdavidb on 2006-10-02T20:30:53
The only two incantations of COUNT() I have used that worked the way I expected were COUNT(*) (to count rows) and COUNT(DISTINCT field) (to count distinct values of a field, and even then I'm not sure what it does with NULL). This is with Oracle; with other RDBMSes YMMV. I'm sure there's some magic I could do with COUNT(field) and COUNT(field1,
..., fieldN), but I don't know how it works. I may have done that in Oracle class or database class at the university, but I didn't retain any information on how to get it to work (or even what it's supposed to do). Re:count()
Ovid on 2006-10-02T21:43:21
Regrettably, this is something that much documentation does not make clear. I'm expecting count() to return the number of instances of a particular value. Given that NULL can be thought of as "unknown", I suppose one could argue that it makes sense that it doesn't count the number of values. However, it also seems reasonable for one to assume that count(some_field) will return how many unknown values there are.
Your explanation is perfectly correct. I'm just frustrated that an arguably "intuitive" answer turns out to be very wrong. I should have remembered that I absolutely must avoid NULLs whenever possible, but in this case, I threw together a quick query to examine a database problem and was mystified that my results were clearly contrary to my expectations.
I'll remember to use count(*) or count(1) in the future, when possible.
Re:count()
Abigail on 2006-10-02T22:16:34
I'm just frustrated that an arguably "intuitive" answer turns out to be very wrong.That's because your "intuitive" answer isn't the "intuitive" answer of one someone who breathes SQL. Just like what you find "intuitive" in Perl isn't "intuitive" for many people who also use Perl.
I do find the answer "intuitive", but only because I keep myself reminding that a NULL in a relational database is a very special thing, and far more undefined than 'undef' is in Perl. If you try to think of a database 'NULL' as an 'undef' in Perl, or a NULL pointer in C, things rapidly come "unintuitive".
Re:count()
Aristotle on 2006-10-03T03:51:58
Ditto Abigail about intuition. I think the way
COUNT
works in SQL w.r.t. NULLs is actually useful and helpful.Your stance about NULLs reminds me about the saying about GOTO and the apprentice, the journeyman and the master. (Not that I can claim to be a master, mind…)
Re:count()
jdavidb on 2006-10-03T13:09:30
I guess some of the strange logic is this: if you want to get a count of the number of unique values, you use COUNT(DISTINCT field). If you want to get a count of records, you use COUNT(*). Now what do you want when you ask for COUNT(field)? Do you want a count of distinct values? Use COUNT(DISTINCT field). Do you want a count of records? Use COUNT(*). If you grabbed field and then counted the number of values (not distinct), you'd get absolutely the same results as COUNT(*), right?
So I guess some of the tortured logic is, "COUNT(field) should do something completely different since the other two effects can already be achieved with different code. What else can we do? Oh, I know! Let's have it return a count of all the NON-NULL values! After all, it's got to have some reason for actually looking at field; otherwise the coder would've just said COUNT(*)."
Weird, but sensible, in the way that any language decision that comes out of a committee is "sensible."
Ignoring NULLs in COUNT (just as Abigail explained) is just consistent with other aggregate functions in SQL. For example, SUM(foo) sums up every column 'foo' except the null ones. So there's no magical transformation of null to 0 and no concern with exceptions raised because there are some nulls out there. Obviously this behavior is equivalent to converting NULLs to 0, but it is not so everywhere. MAX(foo) and MIN(foo) also work this way: ignore NULLs - which is good because they don't compare just numbers and converting NULLs to 0 could also distort the maxima and minima unexpectedly.
To count rows, you may use COUNT(1) as well.