Oracle: hash antijoins?

djberg96 on 2004-09-17T21:16:11

One of the few things I haven't been able to confirm from Guy Harrison's book is that a hash antijoin is more efficient than a minus when doing a diff. Let me provide a couple quick examples: -- minus select foo, bar, baz from table1 minus select foo, bar, baz from table2

-- hash antijoin select foo, bar, baz from table1 where (foo, bar, baz) not in ( select /*+ hash_aj */ foo, bar, baz from table2 )
The explain plan for the minus query looks reasonable, while the one for the hash antijoin looks awful and, indeed, performs badly.

However, I'm also querying against a view, which in turn is getting its data from a link, so that might be a major issue. The view itself seems to be reasonable.

Anyone out there have an opinion?


Is the hint being used?

runrig on 2004-09-17T22:34:11

As a side note, I've never had the opportunity to use a multi-column /(not )?in/ clause, I've wanted it for years and have never heard of it being a valid syntax until now.

I googled for hash_aj and found this in which one of the replies suggests that the hint is not being used, and that there should be a 'and field is not null' clause in both the query and subquery (the example there is a single column 'not in' clause, so I assume there should be a 'not null' condition for each column?).

Re:Is the hint being used?

runrig on 2004-09-23T15:55:25

I meant to say that though the reply suggests that the hint is not being used, adding a 'field is not null' expression to the where clauses (main and sub query) would maybe cause the hint to be used, and that printing the query plan should tell you for sure whether or not it is being used. I don't currently use Oracle, so I can't test any of this :)

Re:Is the hint being used?

djberg96 on 2004-09-24T20:42:11

Indeed, that link proved to be useful - thanks. Adding "is not null" conditionals forced the hint to be used. The only problem is that it can alter your result set when compared to a minus (i.e. what if I don't want to exclude the NULL's?).

Besides, in my 10 column compare the performance increase was negligible - about 1 second faster, and that was with a PARALLEL(4) hint tossed in to boot.

But, it's good to know and YMMV. :)

Re:Is the hint being used?

runrig on 2004-09-24T23:10:02

what if I don't want to exclude the NULL's?

Then I think you need to use the MINUS anyway (which is not portable -- doesn't work on my version of Informix, and SqlServer doesn't complain, but it doesn't do the right thing either). If there are any nulls in the sub-query table, then you shouldn't get any results at all unless you have 'is not null' for those columns in the subquery, and if you have nulls in the main query table, then those records shouldn't be showing up. It's that old "if it's null, then it's neither in nor not in" thing. I've tested this with a single column in/not in clause on Informix, I couldn't say how Oracle behaves.

Have you seen how a good old fashioned correlated sub-query fares?

select foo, bar, baz
from table1 t1
where not exists (
  select 1 from table2 t2
  where t1.foo = t2.foo
  and t1.bar = t2.bar
  and t1.baz = t3.baz
)

MINUS is probably optimized to beat the heck out of this, though.

Re:Is the hint being used?

djberg96 on 2004-09-27T15:04:27

Indeed, a correlated subquery can be very efficient. It can also suck rocks. It largely depends on how your indexes are set up. :)

I'll admit I had brushed this off, so now I'll go back and see how it compares. We don't have control over the remote databases we're pulling some data from so it may or may not be an option.

is it being used ?

tinman on 2004-09-18T11:35:00

I remember reading somewhere that the hint wasn't being used (I see runrig has found a link). minus is way cool, besides (who said tuning was a rational procedure, anyway ?)

Re:is it being used ?

djberg96 on 2004-09-19T13:42:44

minus is way cool, besides

Yeah, it does the job, although one potential drawback is that you can't PARALLEL a minus.

Re:is it being used ?

runrig on 2004-09-25T01:30:06

(who said tuning was a rational procedure, anyway ?)

And that's why they have External SQL Rewriters. (sorry, gotta buy the article if you don't have the hard copy - May 2002 DDJ). Automated brute force SQL rewriting to find the best SQL statement.

Re:is it being used ?

runrig on 2009-04-20T18:22:22

External SQL Rewriters

Hey, a link to the article that works! (at the time of this post anyway).