I'm not sure I understand SQL...

cosimo on 2008-05-07T08:30:38

Can any good souls out there explain to me why this SQL query actually works? And no, it doesn't matter what it does... I already know that :)

    SELECT concept, count(*) AS cnt, c.rtype
      FROM concepts JOIN conceptbindings AS c
        ON (c.conceptid = concepts.id)
  GROUP BY concept
  ORDER BY cnt DESC
     LIMIT 0,100;

If there's some explanation which I missed entirely (and that's possible), good.
If there's not, can you guess the DBMS ? :)


Missing aggregate function?

dakkar on 2008-05-07T11:03:39

The weird part is that the join is a 1-m, right? That is, conceptbindings.conceptid is not UNIQUE. Thus, c.rtype should not be admissible in the SELECT clause. A proper DBMS would complain about the fact that you need an aggregate function (like COUNT, or MIN).

MySQL does not complain, right?

Re:Missing aggregate function?

cosimo on 2008-05-07T11:40:55

Exactly. c.rtype shouldn't be there, or an aggregate function should be used. Or, you could GROUP BY c.rtype too.

But as it is, it shouldn't work. At least within my current understanding of SQL...

And yes, it's MySQL 5.

Re:Missing aggregate function?

dakkar on 2008-05-07T12:35:02

I think MySQL takes the first value it sees for that file, within the group

The reasonableness of this is debatable, but at least make getting some result easier...

Re:Missing aggregate function?

emazep on 2008-05-08T10:39:03

Really, c.rtype should not be admissible there even if conceptbindings.conceptid was UNIQUE (that is, it was a 1-1 relation), since there would be anyway no guarantee that c.rtype had a unique value for each concept group (value).
Simply, the SQL standards for GROUP BY require that the columns in the select list must be in the GROUP BY expression or they must be arguments of aggregate functions (PostgreSQL would indeed complain, and so does even MS Access :-)



Cosimo, you're right about SQL, this behaviour just depends on the leading "My" ;-)
It's however legal in Transact-SQL, which explains why it works under Sybase (with the semantics reported by runrig).

Ciao a tutti,
Emanuele

Re:Missing aggregate function?

revdiablo on 2008-05-12T02:33:09

Simply, the SQL standards for GROUP BY require that the columns in the select list must be in the GROUP BY expression or they must be arguments of aggregate functions

That's not entirely true. The standards after 1992 state that the column must either be in the GROUP BY, part of an aggregate function, OR a "functional dependent" of something in the GROUP BY.

I'm not sure if we can demonstrate a presence or lack of functional dependence based solely on the original SQL posted, but the spec is a little more nuanced than your quote suggests.

I gleaned this information from the following URL:

http://dev.mysql.com/tech-resources/articles/debunking-group-by-myths.html

Re:Missing aggregate function?

emazep on 2008-05-12T11:01:31

That's not entirely true. The standards after 1992 state that the column must either be in the GROUP BY, part of an aggregate function, OR a "functional dependent" of something in the GROUP BY.

As far as I can see this was introduced in SQL:1999, while it's quite common to quote as SQL standards just SQL-92 (or even just SQL-89), since some think that later revisions are not to be considered real standards anymore:
http://www.tdan.com/view-articles/4923/
http://articles.techrepublic.com.com/5100-10878_11-1046268.html

Anyway you're formally correct ;-)

I'm not sure if we can demonstrate a presence or lack of functional dependence based solely on the original SQL posted, but the spec is a little more nuanced than your quote suggests.

Good point.

I implied that if we had had a way to be sure that c.rtype had a unique value for each concept value (which is what functional dependence mean), there would have been no ambiguity. But I haven't probably been clear enough.
(BTW, I see no way to demonstrate that c.rtype is functionally dependent on concept, given solely the original SQL posted, as I said in my post).

However for the record, even if we had concept, c.rtype, concepts.id and c.conceptid all UNIQUE, PostgreSQL (v. 8.3.1) would anyway complain with:

ERROR: column "c.rtype" must appear in the GROUP BY clause or be used in an aggregate function

I gleaned this information from the following URL:

http://dev.mysql.com/tech-resources/articles/debunking-group-by-myths.html

It also says:

In MySQL, one can write GROUP BY queries that reference non-aggregated columns in the SELECT list that are not included in the GROUP BY clause, even if these columns are not functionally dependent upon the GROUP BY clause. This behaviour conforms to none of the SQL standard's versions.
which is probably the answer Cosimo was looking for.

Thanks for sharing this resource.

Ciao!

Weird cartesian join

runrig on 2008-05-07T17:02:53

It surprised me when I saw that Sybase accepted a similar syntax. But the result seemed next to useless. Non-aggregate columns not in the "group by" clause result in a cartesian join, so, every row in the conceptbindings table (via the rtype column) will join with every combination of conceptid and count. I'm not sure if MySQL behaves the same or differently.

Re:Weird cartesian join

cosimo on 2008-05-07T20:27:09

MySQL returns "some value", maybe the first, for each row of grouped concept. But yes, I'd rather prefer a cartesian product than this unexpected behaviour, even if I can understand why it's like that...

Re:Weird cartesian join

btilly on 2008-05-08T02:18:17

Oh no you don't. Not after you've screwed up and accidentally done a cartesian product between 100,000 records and 100,000 records again.

I loathed that.

If they want to save me time, just make the GROUP BY clause optional. Because 99% of the time my GROUP BY clause is just going to be all of the non-aggregate functions in my SELECT clause. And I hate having to type in all of that duplicate information.