Thursday, February 12, 2009

IN vs EXISTS - They are all the same today.

When ever it comes to performance of a query, for a very long time there was a
difference noticed between the usage of IN and EXIST.

Back then, when i was using a 8i database an improper use of IN where you could have used EXISTS can
result in a very bad performance of your query.

The execution plan chosen by the optimizer is completely different for both the query.

Check this out.

ORACLE VERSION: Oracle8i Enterprise Edition Release 8.1.7.2.1 - Production

SQL> create table big_table
2 as
3 select *
4 from all_objects
5 /

Table created.

SQL> create table small_table
2 as
3 select distinct object_type
4 from all_objects
5 /

Table created.

SQL> create index big_table_idx on big_table(object_type)
2 /

Index created.

SQL> exec dbms_stats.gather_table_stats(user,'BIG_TABLE',cascade=>true)

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats(user,'SMALL_TABLE',cascade=>true)

PL/SQL procedure successfully completed.



Now i want to pick up records from my SMALL_TABLE for which object_type exists in my
BIG_TABLE.

First iam using IN for this purpose.

SQL> select *
2 from small_table s
3 where object_type in (select object_type
4 from big_table)
5 /

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=182 Card=15 Bytes=285)
1 0 HASH JOIN (Cost=182 Card=15 Bytes=285)
2 1 VIEW OF 'VW_NSO_1' (Cost=180 Card=15 Bytes=165)
3 2 SORT (UNIQUE) (Cost=180 Card=15 Bytes=90)
4 3 TABLE ACCESS (FULL) OF 'BIG_TABLE' (Cost=55 Card=43794 Bytes=262764)
5 1 TABLE ACCESS (FULL) OF 'SMALL_TABLE' (Cost=1 Card=15 Byt es=120)


The optimizer goes for a full scan on the BIG_TABLE even though i have a index created on it. I can do it
in a better way using EXISTS. See this.

SQL> select *
2 from small_table s
3 where exists(select null
4 from big_table b
5 where b.object_type = s.object_type)
6 /

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=1 Card=1 Bytes=8)
1 0 FILTER
2 1 TABLE ACCESS (FULL) OF 'SMALL_TABLE' (Cost=1 Card=1 Bytes=8)
3 1 INDEX (RANGE SCAN) OF 'BIG_TABLE_IDX' (NON-UNIQUE) (Cost=7 Card=2920 Bytes=17520)


You can notice a nice RANGE SCAN is been done on the BIG_TABLE. So those days it made
a lot of difference. The optimizer could not understand what you are trying to mean. It just gives what you ask for.

But today, things are different. With the smart CBO on place it understands what you want. It rewrites your query for you.

Check this out.

ORACLE VERSION: Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Prod


With both IN and EXIST you don't find any difference. Its both the same. CBO knows exactly what you want. It
knows so well that it even does only a SEMI join. It comes out of your big table once it finds a single match. It
does not go thought the entire table or the index.

See how it works.


SQL> select *
2 from small_table s
3 where exists(select null
4 from big_table b
5 where b.object_type = s.object_type)
6 /

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=34 Card=19 Bytes=323)
1 0 NESTED LOOPS (SEMI) (Cost=34 Card=19 Bytes=323)
2 1 TABLE ACCESS (FULL) OF 'SMALL_TABLE' (TABLE) (Cost=3 Card=31 Bytes=279)
3 1 INDEX (RANGE SCAN) OF 'BIG_TABLE_IDX' (INDEX) (Cost=1 Card=71414 Bytes=571312)

SQL> select *
2 from small_table s
3 where object_type in (select object_type
4 from big_table)
5 /

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=34 Card=19 Bytes=323)
1 0 NESTED LOOPS (SEMI) (Cost=34 Card=19 Bytes=323)
2 1 TABLE ACCESS (FULL) OF 'SMALL_TABLE' (TABLE) (Cost=3 Card=31 Bytes=279)
3 1 INDEX (RANGE SCAN) OF 'BIG_TABLE_IDX' (INDEX) (Cost=1 Card=71414 Bytes=571312)


Isn't that a beauty!!!