Tuesday, December 16, 2008

When the optimizer goes for a FULL SCAN

It is often asked why a SQL is going for a Full Table Scan. And also its widely considered that
Full Table Scan means BAD performance.

So i would try to explain the possible caused when a CBO decides to go for a full table
scan. And also the concept of FULL TABLE SCAN IS NOT EVIL.

To do that i have created a small test setup.

I have create a table FULL_TABLE_SCAN_TEST from all_objects.


SQL> create table full_table_scan_test
2 as
3 select *
4 from all_objects
5 /

SQL> select count(1) from full_table_scan_test
2 /

COUNT(1)
----------
930360



I have indexed the column OBJECT_TYPE.

SQL> create index full_table_scan_test_idx on full_table_scan_test(object_type)
2 /

Index created.

SQL> exec dbms_stats.gather_table_stats(user,'FULL_TABLE_SCAN_TEST',cascade=>TRUE)

PL/SQL procedure successfully completed.



CASE 1: The optimizer normally goes for a FULL TABLE SCAN when it has to retreive a
large portion of data from the table. This is because when accessing a large
portion of data, a scan through the index can be a overhead.

Below is a little demonstration of the same.


Now below query shows the count of records for object_type FUNCTION.

SQL> select count(1)
2 from full_table_scan_test
3 where object_type in ('FUNCTION')
4 /

COUNT(1)
----------
2408



This is a very small portion when compared to the total count which is 930360

So a condition like this must result in a INDEX RANGE SCAN. Lets try it.

SQL> explain plan
2 for
3 select *
4 from full_table_scan_test
5 where object_type in ('FUNCTION')
6 /

Explained.

SQL> select * from table(dbms_xplan.display)
2 /

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------
Plan hash value: 607499257

--------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1833 | 161K| 65 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| FULL_TABLE_SCAN_TEST | 1833 | 161K| 65 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | FULL_TABLE_SCAN_TEST_IDX | 1833 | | 7 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------
2 - access("OBJECT_TYPE"='FUNCTION')

14 rows selected.



As expected we got a INDEX RANGE SCAN.

But now a filter criteria on the same column which will return a large protion of record will go for
a FULL SCAN. This can easily be illustrated.

A filter for object_type TABLE, INDEX returns a vast protion for data as shown below.

SQL> select count(1)
2 from full_table_scan_test
3 where object_type IN ('TABLE','INDEX')
4 /

COUNT(1)
----------
428688



So when a filtering is done for this condition you can see the optimizer going for a FULL TABLE SCAN.

SQL> explain plan
2 for
3 select *
4 from full_table_scan_test
5 where object_type in ('TABLE','INDEX')
6 /

Explained.

SQL> select * from table(dbms_xplan.display)
2 /

PLAN_TABLE_OUTPUT

--------------------------------------------------------------------------------------------------------
Plan hash value: 798910834

------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 437K| 37M| 2796 (3)| 00:00:34 |
|* 1 | TABLE ACCESS FULL| FULL_TABLE_SCAN_TEST | 437K| 37M| 2796 (3)| 00:00:34 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------
1 - filter("OBJECT_TYPE"='INDEX' OR "OBJECT_TYPE"='TABLE')

13 rows selected.



Most of the time people assume that this is bad and try to HINT the optimizer to use the index.

This could turn to be expensive. Lets hint the above query and see what happens.

SQL> delete from plan_table
2 /

2 rows deleted.

SQL> explain plan
2 for
3 select /*+ INDEX(full_table_scan_test full_table_scan_test_idx) */ *
4 from full_table_scan_test
5 where object_type in ('TABLE','INDEX')
6 /

Explained.

SQL> select * from table(dbms_xplan.display)
2 /

PLAN_TABLE_OUTPUT

---------------------------------------------------------------------------------------------------------
Plan hash value: 279860542
---------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 437K| 37M| 14723 (1)| 00:02:57 |
| 1 | INLIST ITERATOR | | | | | |
| 2 | TABLE ACCESS BY INDEX ROWID| FULL_TABLE_SCAN_TEST | 437K| 37M| 14723 (1)| 00:02:57 |
|* 3 | INDEX RANGE SCAN | FULL_TABLE_SCAN_TEST_IDX | 437K| | 1053 (2)| 00:00:13 |
---------------------------------------------------------------------------------------------------------

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):

3 - access("OBJECT_TYPE"='INDEX' OR "OBJECT_TYPE"='TABLE')

15 rows selected.


Here you can see there is a drastic increase in the CPU Cost.

Conclution: FULL TABLE SCAN is one of the optimal way when trying to access a large portion of data.

CASE 2: Use of NOT leads to a FULL TABLE SCAN.

Even though the optimizer is a very smart program it has its owne assumptions and limitations.
One such thing is, when a NOT operatior is used the optimizer assumes that its going to
fetch a large portion of data and directly goes for a FULL TABLE SCAN.

So the use of NOT operator leads to a full table scan even though the data retrived is
less. See the demonstration below.

The follwing condition shown in the below query results in a very small portion of the table

SQL> select count(*)
2 from full_table_scan_test
3 where object_type not in ('TABLE','INDEX','VIEW','SYNONYM',
4 'TYPE','JAVA CLASS','PACKAGE','PACKAGE BODY')
5 /

COUNT(*)
----------
21184



But you can see that the optimizer decides to go for a FULL TABLE SCAN. Check this out.

SQL> explain plan
2 for
3 select *
4 from full_table_scan_test
5 where object_type not in ('TABLE','INDEX','VIEW','SYNONYM',
6 'TYPE','JAVA CLASS','PACKAGE','PACKAGE BODY')
7 /

Explained.

SQL> select * from table(dbms_xplan.display)
2 /

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------
Plan hash value: 798910834
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 313K| 26M| 2813 (4)| 00:00:34 |
|* 1 | TABLE ACCESS FULL| FULL_TABLE_SCAN_TEST | 313K| 26M| 2813 (4)| 00:00:34 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------
1 - filter("OBJECT_TYPE"<>'INDEX' AND "OBJECT_TYPE"<>'TABLE' AND
"OBJECT_TYPE"<>'SYNONYM' AND "OBJECT_TYPE"<>'VIEW' AND "OBJECT_TYPE"<>'JAVA
CLASS' AND "OBJECT_TYPE"<>'TYPE' AND "OBJECT_TYPE"<>'PACKAGE BODY' AND
"OBJECT_TYPE"<>'PACKAGE')

16 rows selected.


Even hinting will lead only to a INDEX FULL SCAN. A RANGE SCAN is never an option with NOT operator.

Lets hint and see.

SQL> explain plan
2 for
3 select /*+ INDEX(full_table_scan_test full_table_scan_test_idx) */ *
4 from full_table_scan_test
5 where object_type not in ('TABLE','INDEX','VIEW','SYNONYM',
6 'TYPE','JAVA CLASS','PACKAGE','PACKAGE BODY')
7 /

Explained.

SQL> select * from table(dbms_xplan.display)
2 /

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------
Plan hash value: 2742810082
--------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 313K| 26M| 12196 (1)| 00:02:27 |
| 1 | TABLE ACCESS BY INDEX ROWID| FULL_TABLE_SCAN_TEST | 313K| 26M| 12196 (1)| 00:02:27 |
|* 2 | INDEX FULL SCAN | FULL_TABLE_SCAN_TEST_IDX | 313K| | 2412 (3)| 00:00:29 |
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------
2 - filter("OBJECT_TYPE"<>'INDEX' AND "OBJECT_TYPE"<>'TABLE' AND "OBJECT_TYPE"<>'SYNONYM'
AND "OBJECT_TYPE"<>'VIEW' AND "OBJECT_TYPE"<>'JAVA CLASS' AND "OBJECT_TYPE"<>'TYPE' AND
"OBJECT_TYPE"<>'PACKAGE BODY' AND "OBJECT_TYPE"<>'PACKAGE')

16 rows selected.



Conclution: Use NOT operation while retreiving a large portion of data. If you are going to retreive
a small portion of data try to be positive ;) and do not use NOT operator. One work around
could be use of > and < together.

CASE 3: A NULL column is not an option for INDEX FAST FULL SCAN.

Normally when we do a COUNT on a table that has a index on a column we expect it to use that
index. But that is possible only when the indexed column is NOT NULL.

This is simply because the NULL values of a column are not indexed. And hence the optimizer
cant scan throught the index to give you the exact count of a table.

A little demonstration about this is given below.

The column OBJECT_TYPE is a NULL'able column. Hence when you do a COUNT on the table it goes for a FULL SCAN.

Check this out.

SQL> explain plan
2 for
3 select count(*)
4 from full_table_scan_test
5 /

Explained.

SQL> select * from table(dbms_xplan.display)
2 /

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------
Plan hash value: 3323906207
-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 2755 (2)| 00:00:34 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | TABLE ACCESS FULL| FULL_TABLE_SCAN_TEST | 930K| 2755 (2)| 00:00:34 |
-----------------------------------------------------------------------------------

9 rows selected.



So now lets modify the column OBJECT_TYPE as NOT NULL and see what happens.

SQL> alter table full_table_scan_test modify object_type not null
2 /

Table altered.

SQL> exec dbms_stats.gather_table_stats(user,'FULL_TABLE_SCAN_TEST',cascade=>TRUE)

PL/SQL procedure successfully completed.



Now i execute the same query.

SQL> explain plan
2 for
3 select count(*)
4 from full_table_scan_test
5 /

Explained.

SQL> select * from table(dbms_xplan.display)
2 /

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------
Plan hash value: 2070121381
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 536 (5)| 00:00:07 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | INDEX FAST FULL SCAN| FULL_TABLE_SCAN_TEST_IDX | 937K| 536 (5)| 00:00:07 |
------------------------------------------------------------------------------------------

9 rows selected.



Conclution: Optimizer can't trust a INDEX on a NULL'able column to getting all the data of that column.