Thursday, February 12, 2009

IN vs EXISTS - They are all the same today.

When ever it comes to performance of a query, for a very long time there was a
difference noticed between the usage of IN and EXIST.

Back then, when i was using a 8i database an improper use of IN where you could have used EXISTS can
result in a very bad performance of your query.

The execution plan chosen by the optimizer is completely different for both the query.

Check this out.

ORACLE VERSION: Oracle8i Enterprise Edition Release 8.1.7.2.1 - Production

SQL> create table big_table
2 as
3 select *
4 from all_objects
5 /

Table created.

SQL> create table small_table
2 as
3 select distinct object_type
4 from all_objects
5 /

Table created.

SQL> create index big_table_idx on big_table(object_type)
2 /

Index created.

SQL> exec dbms_stats.gather_table_stats(user,'BIG_TABLE',cascade=>true)

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats(user,'SMALL_TABLE',cascade=>true)

PL/SQL procedure successfully completed.



Now i want to pick up records from my SMALL_TABLE for which object_type exists in my
BIG_TABLE.

First iam using IN for this purpose.

SQL> select *
2 from small_table s
3 where object_type in (select object_type
4 from big_table)
5 /

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=182 Card=15 Bytes=285)
1 0 HASH JOIN (Cost=182 Card=15 Bytes=285)
2 1 VIEW OF 'VW_NSO_1' (Cost=180 Card=15 Bytes=165)
3 2 SORT (UNIQUE) (Cost=180 Card=15 Bytes=90)
4 3 TABLE ACCESS (FULL) OF 'BIG_TABLE' (Cost=55 Card=43794 Bytes=262764)
5 1 TABLE ACCESS (FULL) OF 'SMALL_TABLE' (Cost=1 Card=15 Byt es=120)


The optimizer goes for a full scan on the BIG_TABLE even though i have a index created on it. I can do it
in a better way using EXISTS. See this.

SQL> select *
2 from small_table s
3 where exists(select null
4 from big_table b
5 where b.object_type = s.object_type)
6 /

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=1 Card=1 Bytes=8)
1 0 FILTER
2 1 TABLE ACCESS (FULL) OF 'SMALL_TABLE' (Cost=1 Card=1 Bytes=8)
3 1 INDEX (RANGE SCAN) OF 'BIG_TABLE_IDX' (NON-UNIQUE) (Cost=7 Card=2920 Bytes=17520)


You can notice a nice RANGE SCAN is been done on the BIG_TABLE. So those days it made
a lot of difference. The optimizer could not understand what you are trying to mean. It just gives what you ask for.

But today, things are different. With the smart CBO on place it understands what you want. It rewrites your query for you.

Check this out.

ORACLE VERSION: Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Prod


With both IN and EXIST you don't find any difference. Its both the same. CBO knows exactly what you want. It
knows so well that it even does only a SEMI join. It comes out of your big table once it finds a single match. It
does not go thought the entire table or the index.

See how it works.


SQL> select *
2 from small_table s
3 where exists(select null
4 from big_table b
5 where b.object_type = s.object_type)
6 /

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=34 Card=19 Bytes=323)
1 0 NESTED LOOPS (SEMI) (Cost=34 Card=19 Bytes=323)
2 1 TABLE ACCESS (FULL) OF 'SMALL_TABLE' (TABLE) (Cost=3 Card=31 Bytes=279)
3 1 INDEX (RANGE SCAN) OF 'BIG_TABLE_IDX' (INDEX) (Cost=1 Card=71414 Bytes=571312)

SQL> select *
2 from small_table s
3 where object_type in (select object_type
4 from big_table)
5 /

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=34 Card=19 Bytes=323)
1 0 NESTED LOOPS (SEMI) (Cost=34 Card=19 Bytes=323)
2 1 TABLE ACCESS (FULL) OF 'SMALL_TABLE' (TABLE) (Cost=3 Card=31 Bytes=279)
3 1 INDEX (RANGE SCAN) OF 'BIG_TABLE_IDX' (INDEX) (Cost=1 Card=71414 Bytes=571312)


Isn't that a beauty!!!

Thursday, January 15, 2009

INDEX SKIP SCAN

A new feature called INDEX SKIP SCAN was introduced in Oracle 9i. By which oracle has the ability to skip some column of the index and go with the rest.

For example if you have a index with column (NAME, EMPLID), before 9i when you specify a where condition which just have EMPLID then the index is not considered.

But from 9i the new feature enables the optimizer to skip the NAME field of the index a go for the EMPLID.

A simple test case can explain this. See below.

SQL> create table t
2 as
3 select 'MY_TEST_DATA' obj_desc, object_id, object_name, object_type
4 from all_objects
5 /

Table created.

SQL> create index t_idx on t(obj_desc, object_id)
2 /

Index created.

SQL> exec dbms_stats.gather_table_stats(user, 'T', cascade=>true)

PL/SQL procedure successfully completed.

SQL> select * from t where object_id = 40
2 /

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=4 Card=1 Bytes=45)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'T' (TABLE) (Cost=4 Card=1 Bytes=45)
2 1 INDEX (SKIP SCAN) OF 'T_IDX' (INDEX) (Cost=3 Card=1)


SQL> select * from t where obj_desc = 'MY_TEST_DATA' and object_id = 40
2 /

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=4 Card=1 Bytes=45)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'T' (TABLE) (Cost=4 Card=1 Bytes=45)
2 1 INDEX (RANGE SCAN) OF 'T_IDX' (INDEX) (Cost=3 Card=1)


But i am not sure if SKIP SCAN is as good as RANGE SCAN. But you can be certain that its better than FULL SCAN.

But again the selection of this option by the optimizer depends on the cardinality of the first field. In the above example we saw that OBJ_DESC had one value and hence the optimizer went for a SKIP SCAN.

Lets take a case where the first column of the index has a high cardinality. Here i am going to create OBJ_DESC field with unique values. lets see what happens.

SQL> drop table t
2 /

Table dropped.

SQL> create table t
2 as
3 select 'MY_TEST_DATA'||rownum obj_desc, object_id, object_name, object_type
4 from all_objects
5 /

Table created.

SQL> create index t_idx on t(obj_desc, object_id)
2 /

Index created.

SQL> exec dbms_stats.gather_table_stats(user, 'T', cascade=>true)

PL/SQL procedure successfully completed.

SQL> select * from t where object_id = 40
2 /

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=209 Card=1 Bytes=50)
1 0 TABLE ACCESS (FULL) OF 'T' (TABLE) (Cost=209 Card=1 Bytes=50)


SQL> select * from t where obj_desc = 'MY_TEST_DATA' and object_id = 40
2 /

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=4 Card=1 Bytes=50)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'T' (TABLE) (Cost=4 Card=1 Bytes=50)
2 1 INDEX (RANGE SCAN) OF 'T_IDX' (INDEX) (Cost=3 Card=1)




Now you can very well see that the Optimizer decides to go for a FULL SCAN.