So I was debugging a MySQL problem and quickly hacked out the following query. It took a long time to run before finally giving me an 'OUT OF MEMORY' error:
SELECT distinct(ip) FROM foo WHERE ip NOT IN ( SELECT distinct(ip) FROM bar )
See a problem with that query? The problem I have is one you can't see. A coworker pointed out a typo. There's a bar.id column, but not a bar.ip column. MySQL didn't catch that. It was apparently trying to run the outer query, collect all results and then parsing and filtering on the subquery.
I'm curious what the point of the 'distinct' in the subquery is, btw. I think that would make the explain plan worse in Oracle. But I digress.
For stuff like this in Oracle, I generally opt for the simple minus query. Reasonable performance and easy to understand.
How do the following compare in MySQL?
Re:Oracle
Ovid on 2007-01-22T20:17:10
I couldn't really say how those others compare. I'm not a performance guru. I've traditionally gone more for the normalization aspect and not worried as much about the implementation, trusting the DBAs to be good at their jobs. Now I'm finding I need to focus on the implementation more and more.
As for the 'distinct' clause in the subquery, I really thought there might be duplicate ip addresses in that table and I had no idea if MySQL would optimize those away or force every value to be compared multiple times. I should look at the 'NOT IN' behavior more to see how it handles bags instead of sets.