My World

Tuesday, December 16, 2008

When the optimizer goes for a FULL SCAN

It is often asked why a SQL is going for a Full Table Scan. And also its widely considered that
Full Table Scan means BAD performance.

So i would try to explain the possible caused when a CBO decides to go for a full table
scan. And also the concept of FULL TABLE SCAN IS NOT EVIL.

To do that i have created a small test setup.

I have create a table FULL_TABLE_SCAN_TEST from all_objects.


SQL> create table full_table_scan_test
  2  as
  3  select *
  4    from all_objects
  5  /

SQL> select count(1) from full_table_scan_test
  2  /

  COUNT(1)
----------
    930360

I have indexed the column OBJECT_TYPE.

SQL> create index full_table_scan_test_idx on full_table_scan_test(object_type)
  2  /

Index created.

SQL> exec dbms_stats.gather_table_stats(user,'FULL_TABLE_SCAN_TEST',cascade=>TRUE)

PL/SQL procedure successfully completed.

CASE 1: The optimizer normally goes for a FULL TABLE SCAN when it has to retreive a
large portion of data from the table. This is because when accessing a large
portion of data, a scan through the index can be a overhead.

Below is a little demonstration of the same.

Now below query shows the count of records for object_type FUNCTION.

SQL> select count(1)
  2    from full_table_scan_test
  3   where object_type in ('FUNCTION')
  4  /

  COUNT(1)
----------
      2408

This is a very small portion when compared to the total count which is 930360

So a condition like this must result in a INDEX RANGE SCAN. Lets try it.

SQL> explain plan
  2  for
  3  select *
  4    from full_table_scan_test
  5   where object_type in ('FUNCTION')
  6  /

Explained.

SQL> select * from table(dbms_xplan.display)
  2  /

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------
Plan hash value: 607499257

--------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name                     | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                          |  1833 |   161K|    65   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| FULL_TABLE_SCAN_TEST     |  1833 |   161K|    65   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | FULL_TABLE_SCAN_TEST_IDX |  1833 |       |     7   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------
   2 - access("OBJECT_TYPE"='FUNCTION')

14 rows selected.

As expected we got a INDEX RANGE SCAN.

But now a filter criteria on the same column which will return a large protion of record will go for
a FULL SCAN. This can easily be illustrated.

A filter for object_type TABLE, INDEX returns a vast protion for data as shown below.

SQL> select count(1)
  2    from full_table_scan_test
  3   where object_type IN ('TABLE','INDEX')
  4  /

  COUNT(1)
----------
    428688

So when a filtering is done for this condition you can see the optimizer going for a FULL TABLE SCAN.

SQL> explain plan
  2  for
  3  select *
  4    from full_table_scan_test
  5   where object_type in ('TABLE','INDEX')
  6  /

Explained.

SQL> select * from table(dbms_xplan.display)
  2  /

PLAN_TABLE_OUTPUT

--------------------------------------------------------------------------------------------------------
Plan hash value: 798910834

------------------------------------------------------------------------------------------
| Id  | Operation         | Name                 | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                      |   437K|    37M|  2796   (3)| 00:00:34 |
|*  1 |  TABLE ACCESS FULL| FULL_TABLE_SCAN_TEST |   437K|    37M|  2796   (3)| 00:00:34 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------
   1 - filter("OBJECT_TYPE"='INDEX' OR "OBJECT_TYPE"='TABLE')

13 rows selected.

Most of the time people assume that this is bad and try to HINT the optimizer to use the index.

This could turn to be expensive. Lets hint the above query and see what happens.

SQL> delete from plan_table
  2  /

2 rows deleted.

SQL> explain plan
  2  for
  3  select /*+ INDEX(full_table_scan_test full_table_scan_test_idx) */ *
  4    from full_table_scan_test
  5   where object_type in ('TABLE','INDEX')
  6  /

Explained.

SQL> select * from table(dbms_xplan.display)
  2  /

PLAN_TABLE_OUTPUT

---------------------------------------------------------------------------------------------------------
Plan hash value: 279860542
---------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name                     | Rows  | Bytes | Cost (%CPU)| Time  |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                          |   437K|    37M| 14723   (1)| 00:02:57 |
|   1 |  INLIST ITERATOR             |                          |       |       |            |       |
|   2 |   TABLE ACCESS BY INDEX ROWID| FULL_TABLE_SCAN_TEST     |   437K|    37M| 14723   (1)| 00:02:57 |
|*  3 |    INDEX RANGE SCAN          | FULL_TABLE_SCAN_TEST_IDX |   437K|       |  1053   (2)| 00:00:13 |
---------------------------------------------------------------------------------------------------------

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):

   3 - access("OBJECT_TYPE"='INDEX' OR "OBJECT_TYPE"='TABLE')

15 rows selected.

Here you can see there is a drastic increase in the CPU Cost.

Conclution: FULL TABLE SCAN is one of the optimal way when trying to access a large portion of data.

CASE 2: Use of NOT leads to a FULL TABLE SCAN.

Even though the optimizer is a very smart program it has its owne assumptions and limitations.
One such thing is, when a NOT operatior is used the optimizer assumes that its going to
fetch a large portion of data and directly goes for a FULL TABLE SCAN.

So the use of NOT operator leads to a full table scan even though the data retrived is
less. See the demonstration below.

The follwing condition shown in the below query results in a very small portion of the table

SQL> select count(*)
  2    from full_table_scan_test
  3   where object_type not in ('TABLE','INDEX','VIEW','SYNONYM',
  4                             'TYPE','JAVA CLASS','PACKAGE','PACKAGE BODY')
  5  /

  COUNT(*)
----------
     21184

But you can see that the optimizer decides to go for a FULL TABLE SCAN. Check this out.

SQL> explain plan
  2  for
  3  select *
  4    from full_table_scan_test
  5   where object_type not in ('TABLE','INDEX','VIEW','SYNONYM',
  6                             'TYPE','JAVA CLASS','PACKAGE','PACKAGE BODY')
  7  /

Explained.

SQL> select * from table(dbms_xplan.display)
  2  /

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------
Plan hash value: 798910834
------------------------------------------------------------------------------------------
| Id  | Operation         | Name                 | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                      |   313K|    26M|  2813   (4)| 00:00:34 |
|*  1 |  TABLE ACCESS FULL| FULL_TABLE_SCAN_TEST |   313K|    26M|  2813   (4)| 00:00:34 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------
   1 - filter("OBJECT_TYPE"<>'INDEX' AND "OBJECT_TYPE"<>'TABLE' AND
              "OBJECT_TYPE"<>'SYNONYM' AND "OBJECT_TYPE"<>'VIEW' AND "OBJECT_TYPE"<>'JAVA
              CLASS' AND "OBJECT_TYPE"<>'TYPE' AND "OBJECT_TYPE"<>'PACKAGE BODY' AND
              "OBJECT_TYPE"<>'PACKAGE')

16 rows selected.

Even hinting will lead only to a INDEX FULL SCAN. A RANGE SCAN is never an option with NOT operator.

Lets hint and see.

SQL> explain plan
  2  for
  3  select /*+ INDEX(full_table_scan_test full_table_scan_test_idx) */ *
  4    from full_table_scan_test
  5   where object_type not in ('TABLE','INDEX','VIEW','SYNONYM',
  6                             'TYPE','JAVA CLASS','PACKAGE','PACKAGE BODY')
  7  /

Explained.

SQL> select * from table(dbms_xplan.display)
  2  /

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------
Plan hash value: 2742810082
--------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name                     | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                          |   313K|    26M| 12196   (1)| 00:02:27 |
|   1 |  TABLE ACCESS BY INDEX ROWID| FULL_TABLE_SCAN_TEST     |   313K|    26M| 12196   (1)| 00:02:27 |
|*  2 |   INDEX FULL SCAN           | FULL_TABLE_SCAN_TEST_IDX |   313K|       |  2412   (3)| 00:00:29 |
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------
   2 - filter("OBJECT_TYPE"<>'INDEX' AND "OBJECT_TYPE"<>'TABLE' AND "OBJECT_TYPE"<>'SYNONYM'
              AND "OBJECT_TYPE"<>'VIEW' AND "OBJECT_TYPE"<>'JAVA CLASS' AND "OBJECT_TYPE"<>'TYPE' AND
              "OBJECT_TYPE"<>'PACKAGE BODY' AND "OBJECT_TYPE"<>'PACKAGE')

16 rows selected.

Conclution: Use NOT operation while retreiving a large portion of data. If you are going to retreive
a small portion of data try to be positive ;) and do not use NOT operator. One work around
could be use of > and < together.

CASE 3: A NULL column is not an option for INDEX FAST FULL SCAN.

Normally when we do a COUNT on a table that has a index on a column we expect it to use that
index. But that is possible only when the indexed column is NOT NULL.

This is simply because the NULL values of a column are not indexed. And hence the optimizer
cant scan throught the index to give you the exact count of a table.

A little demonstration about this is given below.

The column OBJECT_TYPE is a NULL'able column. Hence when you do a COUNT on the table it goes for a FULL SCAN.

Check this out.

SQL> explain plan
  2  for
  3  select count(*)
  4    from full_table_scan_test
  5  /

Explained.

SQL> select * from table(dbms_xplan.display)
  2  /

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------
Plan hash value: 3323906207
-----------------------------------------------------------------------------------
| Id  | Operation          | Name                 | Rows  | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |                      |     1 |  2755   (2)| 00:00:34 |
|   1 |  SORT AGGREGATE    |                      |     1 |            |          |
|   2 |   TABLE ACCESS FULL| FULL_TABLE_SCAN_TEST |   930K|  2755   (2)| 00:00:34 |
-----------------------------------------------------------------------------------

9 rows selected.

So now lets modify the column OBJECT_TYPE as NOT NULL and see what happens.

SQL> alter table full_table_scan_test modify object_type not null
  2  /

Table altered.

SQL> exec dbms_stats.gather_table_stats(user,'FULL_TABLE_SCAN_TEST',cascade=>TRUE)

PL/SQL procedure successfully completed.

Now i execute the same query.

SQL> explain plan
  2  for
  3  select count(*)
  4    from full_table_scan_test
  5  /

Explained.

SQL> select * from table(dbms_xplan.display)
  2  /

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------
Plan hash value: 2070121381
------------------------------------------------------------------------------------------
| Id  | Operation             | Name                     | Rows  | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                          |     1 |   536   (5)| 00:00:07 |
|   1 |  SORT AGGREGATE       |                          |     1 |            |          |
|   2 |   INDEX FAST FULL SCAN| FULL_TABLE_SCAN_TEST_IDX |   937K|   536   (5)| 00:00:07 |
------------------------------------------------------------------------------------------

9 rows selected.

Conclution: Optimizer can't trust a INDEX on a NULL'able column to getting all the data of that column.

Wednesday, November 12, 2008

Index is not all GOD

Another case to prove that index is not all GOD and FULL TABLE SCAN is not evil.

Today i saw this question on the OTN forum.

Hi All,

I need to execute a query like this :
SELECT * FROM ORDERS WHERE APPROVE_DATE IS NULL

I read anywhere that this will cause unnecessary FTS so that I should create function based index.

I have tried one below , but not sure that this is correct approach :

CREATE INDEX idx_1
ON ORDERS (NVL(APPROVE_DATE, '01-JAN-1900'));

SELECT * FROM ORDERS WHERE NVL(APPROVE_DATE, '01-JAN-1900') = '01-JAN-1900'

Is this a correct approach ?

Thank you,
xtanto

This is what i gave in reply. I prepared a simple test case to show that an index be USED as well as IGNORED.

A SQL_TRACE output will explain clearly what Justin has stated.

I have created a table T based on all_objects.

SQL> desc t
 Name                                      Null?    Type
 ----------------------------------------- -------- -----------------------
 OWNER                                     NOT NULL VARCHAR2(30)
 OBJECT_NAME                               NOT NULL VARCHAR2(30)
 SUBOBJECT_NAME                                     VARCHAR2(30)
 OBJECT_ID                                 NOT NULL NUMBER
 DATA_OBJECT_ID                                     NUMBER
 OBJECT_TYPE                                        VARCHAR2(19)
 CREATED                                            DATE
 LAST_DDL_TIME                             NOT NULL DATE
 TIMESTAMP                                          VARCHAR2(19)
 STATUS                                             VARCHAR2(7)
 TEMPORARY                                          VARCHAR2(1)
 GENERATED                                          VARCHAR2(1)
 SECONDARY                                          VARCHAR2(1)

CASE I

SQL> select count(1) from t
  2  / 
 
  COUNT(1)
----------
    934320
 
SQL> select count(1) from t where created is null
  2  / 
 
  COUNT(1)
----------
      2376

The number of null values in CREATED column is proportionately very small.

Now i execute the query without function based index.

select *
  from t
 where created is null
 
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.09          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch      160      0.04       0.10          0      12662          0        2376
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total      162      0.04       0.19          0      12662          0        2376
 
Rows     Execution Plan
-------  ---------------------------------------------------
      0  SELECT STATEMENT   GOAL: ALL_ROWS
   2376   TABLE ACCESS   GOAL: ANALYZED (FULL) OF 'T' (TABLE)

And here is the query that uses the function based index

select *
  from t
 where nvl(created,to_date('01-01-1900','DD-MM-YYYY')) = to_date('01-01-1900','DD-MM-YYYY')
 
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch      160      0.01       0.01          0        698          0        2376
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total      162      0.03       0.01          0        698          0        2376
 
Rows     Execution Plan
-------  ---------------------------------------------------
      0  SELECT STATEMENT   GOAL: ALL_ROWS
   2376   TABLE ACCESS   GOAL: ANALYZED (BY INDEX ROWID) OF 'T' (TABLE)
   2376    INDEX   GOAL: ANALYZED (RANGE SCAN) OF 'T_FN_IDX' (INDEX)

Its very obvious from the above output that the Function Based Index as increased the performance.

CASE II

SQL> select count(1) from t
  2  / 
 
  COUNT(1)
----------
    934320
 
SQL> select count(1) from t where created is null
  2  / 
 
  COUNT(1)
----------
    202168

Now the null values in the CREATED column is proportionately large than the first test case.

Now lets see without using the function based index

select *
  from t
 where created is null
 
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch    13479      0.46       0.71          2      25832          0      202168
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    13481      0.46       0.71          2      25832          0      202168
 
Rows     Execution Plan
-------  ---------------------------------------------------
      0  SELECT STATEMENT   GOAL: ALL_ROWS
 202168   TABLE ACCESS   GOAL: ANALYZED (FULL) OF 'T' (TABLE)

Now iam trying to use the function based index

select *
  from t
 where nvl(created,to_date('01-01-1900','DD-MM-YYYY')) = to_date('01-01-1900','DD-MM-YYYY')
 
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch    13479      0.54       0.84          0      33826          0      202168
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    13481      0.54       0.84          0      33826          0      202168
 
Rows     Execution Plan
-------  ---------------------------------------------------
      0  SELECT STATEMENT   GOAL: ALL_ROWS
 202168   TABLE ACCESS   GOAL: ANALYZED (FULL) OF 'T' (TABLE)

Its obvious from the result that oracle has decided to go for a FULL TABLE SCAN even when an index was available.

So just having a function based index is not going to increase the query performance. There are lot of other factors to be considered as stated above.

The original link to the OTN Thread is

http://forums.oracle.com/forums/thread.jspa?threadID=825995&tstart=0

Thanks,
Karthick.

Wednesday, July 2, 2008

Loding XML using external table

A small example to load XML using external table.

SQL> CREATE DIRECTORY my_xml_dir AS 'E:\oracle\Log_files\UTL_AKIVATST'
  2  /

Directory created.

SQL> DROP TABLE my_xml_et
  2  /

Table dropped.

SQL> CREATE TABLE my_xml_et
  2  (  EMPNO NUMBER,
  3     EMPNAME VARCHAR2(10),
  4     JOB VARCHAR2(10),
  5     HIREDATE DATE,
  6     SAL NUMBER
  7  )
  8  ORGANIZATION EXTERNAL
  9  (
 10     TYPE ORACLE_LOADER
 11     DEFAULT DIRECTORY my_xml_dir
 12     ACCESS PARAMETERS
 13     (
 14             records delimited by "</EMP>"
 15             badfile my_xml_dir:'empxt%a_%p.bad'
 16             logfile my_xml_dir:'empxt%a_%p.log'
 17             FIELDS
 18             (
 19                     filler            char(2000) terminated by "<EMP>",
 20                     EMPNO             char(2000) enclosed by "<EMPNO>" and "</EMPNO>",
 21                     EMPNAME           char(2000) enclosed by "<ENAME>" and "</ENAME>",
 22                     JOB               char(2000) enclosed by "<JOB>" and "</JOB>",
 23                     HIREDATE          char(2000) enclosed by "<HIREDATE>" and "</HIREDATE>",
 24                     SAL               char(2000) enclosed by "<SAL>" and "</SAL>"
 25             )
 26     )
 27     LOCATION ('emp.xml')
 28  )
 29  PARALLEL
 30  REJECT LIMIT UNLIMITED
 31  /

Table created.

SQL> SELECT * FROM my_xml_et
  2  /

     EMPNO EMPNAME    JOB        HIREDATE         SAL
---------- ---------- ---------- --------- ----------
      7369 SMITH      CLERK      17-DEC-80        800
      7499 ALLEN      SALESMAN   20-FEB-81       1600

This is the sample XML that i used.

emp.xml
----------------

<EMPLOYEES>
<EMP>
<EMPNO>7369</EMPNO>
<ENAME>SMITH</ENAME>
<JOB>CLERK</JOB>
<HIREDATE>17-DEC-80</HIREDATE>
<SAL>800</SAL>
</EMP>
<EMP>
<EMPNO>7499</EMPNO>
<ENAME>ALLEN</ENAME>
<JOB>SALESMAN</JOB>
<HIREDATE>20-FEB-81</HIREDATE>
<SAL>1600</SAL>
<COMM>300</COMM>
</EMP>
</EMPLOYEES>

Wednesday, June 18, 2008

Find out Blocking Session

I came across this pieace of code in Oracle form which helps us to find the blocking sessions. I tought of sharing it here.

SQL> SET ECHO OFF
SQL> SET LINE 10000
SQL> COLUMN BLOCKER FORMAT A11;
SQL> COLUMN BLOCKEE FORMAT A10;
SQL> COLUMN SID FORMAT 99999;
SQL> COLUMN BLOCKER_MODULE FORMAT A15 TRUNC;
SQL> COLUMN BLOCKEE_MODULE FORMAT A15 TRUNC;
SQL>
SQL> ALTER SESSION SET OPTIMIZER_MODE=RULE;

Session altered.

SQL>
SQL>  SELECT a.inst_id,
  2     (SELECT username FROM gv$session s WHERE s.inst_id=a.inst_id AND s.sid=a.sid) blocker,
  3     a.sid,
  4     (SELECT module FROM gv$session s WHERE s.inst_id=a.inst_id AND s.sid=a.sid) blocker_module ,
  5     ' is blocking ' "IS BLOCKING",
  6     b.inst_id,
  7     (SELECT username FROM gv$session s WHERE s.inst_id=b.inst_id AND s.sid=b.sid) blockee,
  8     b.sid ,
  9     (SELECT module FROM gv$session s WHERE s.inst_id=b.inst_id AND s.sid=b.sid) blockee_module
 10     FROM gv$lock a, gv$lock b
 11    where a.block <>0
 12      AND b.request > 0
 13      AND a.id1 = b.id1
 14      AND a.id2 = b.id2
 15      AND a.sid <> b.sid
 16    order by 1, 2
 17  /

   INST_ID BLOCKER        SID BLOCKER_MODULE  IS BLOCKING      INST_ID BLOCKEE       SID BLOCKEE_MODULE
---------- ----------- ------ --------------- ------------- ---------- ---------- ------ ---------------
         1 SYSADM          15 SQL*Plus         is blocking           1 SYSADM         11 T.O.A.D.

SQL>
SQL> ALTER SESSION SET OPTIMIZER_MODE=CHOOSE;

Session altered.

SQL> BEGIN
  2     dbms_output.enable(1000000);
  3     FOR do_loop IN (SELECT session_id, a.object_id, xidsqn, oracle_username, b.owner owner,
  4                                                b.object_name object_name, b.object_type object_type
  5                                       FROM v$locked_object a, dba_objects b
  6                                      WHERE xidsqn != 0
  7                                        and b.object_id = a.object_id)
  8     LOOP
  9             dbms_output.put_line('.');
 10             dbms_output.put_line('Blocking Session   : '||do_loop.session_id);
 11             dbms_output.put_line('Object (Owner/Name): '||do_loop.owner||'.'||do_loop.object_name);
 12             dbms_output.put_line('Object Type        : '||do_loop.object_type);
 13             FOR next_loop IN (SELECT sid
 14                                                 FROM v$lock
 15                                                WHERE id2 = do_loop.xidsqn
 16                                                  AND sid != do_loop.session_id)
 17             LOOP
 18                     dbms_output.put_line('Sessions being blocked   :  '||next_loop.sid);
 19             END LOOP;
 20     END LOOP;
 21  END;
 22  /
.
Blocking Session   : 15
Object (Owner/Name): SYSADM.T
Object Type        : TABLE
Sessions being blocked   :  11

PL/SQL procedure successfully completed.

SQL>

Monday, June 16, 2008

Lesson for the day

With cost based optimizer if you don’t collect statistics properly then it means that you are screwed.

This is what I did.

Connected to:
Oracle9i Enterprise Edition Release 9.2.0.1.0 - 64bit Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.1.0 - Production

I created the following table.

SQL> create table hx_index_test_tbl(
2 no integer not null,
3 name varchar2(50) not null,
4 old_value varchar2(100),
5 new_value varchar2(100));

Table created.

Next I added these two indexes to the table.

SQL> create index hx_index_test_tbl_idx on hx_index_test_tbl(name,no);

Index created.

SQL> create index hx_index_test_tbl_idx_001 on hx_index_test_tbl(name);

Index created.

Then I populated some data.

SQL> insert into hx_index_test_tbl
2 select row_number() over(partition by name order by 1) no, name, rpad('*',100,'*') old_value, rpad('*',100,'*') new_value
3 from (select 'A'mod(level,100) name from dual connect by level <=1000);

1000 rows created.

SQL> commit;

Commit complete.

Got the statistics gathered.

SQL> exec dbms_stats.gather_table_stats(user,'hx_index_test_tbl',method_opt=>'for all indexed columns');

PL/SQL procedure successfully completed.

SQL> set serveroutput on
SQL> set autotrace traceonly explain
SQL> select * from hx_index_test_tbl where name = 'A0';

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=5 Card=10 Bytes=1000)
1 0 TABLE ACCESS (FULL) OF 'HX_INDEX_TEST_TBL' (Cost=5 Card=10 Bytes=1000)

Oops its going for a full scan. It’s not considering my index. I want it to pick up the index HX_INDEX_TEST_TBL_IDX_001.

SQL> select * from hx_index_test_tbl where name = 'A0' and no = 1;

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=100)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'HX_INDEX_TEST_TBL' (Cost=2 Card=1 Bytes=100)
2 1 INDEX (RANGE SCAN) OF 'HX_INDEX_TEST_TBL_IDX' (NON-UNIQUE) (Cost=1 Card=1)

But when I add no in the where clause its considering the index why is it so???

The reason is very simple. See how I have gathered statistics.

dbms_stats.gather_table_stats(user,'hx_index_test_tbl',method_opt=>'for all indexed columns');

I forgot to gather for Index of this table. Now let me gather for the index of this table also.

SQL> EXEC dbms_stats.gather_table_stats(user,'hx_index_test_tbl',method_opt=>'for all indexed columns', cascade=>TRUE);

PL/SQL procedure successfully completed.

Now see what happens...

SQL> select * from hx_index_test_tbl where name = 'A0';

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=10 Bytes=1000)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'HX_INDEX_TEST_TBL' (Cost=2 Card=10 Bytes=1000)
2 1 INDEX (RANGE SCAN) OF 'HX_INDEX_TEST_TBL_IDX_001' (NON-UNIQUE) (Cost=1 Card=10)

So if you don’t gather stats properly with cost based optimizer you will get weird execution plans.

Monday, June 9, 2008

String List

Being a PLSQL developer many a times I have been in a situation where I would get list of values as input in a string and have to check if they exist in a particular table. The values would be separated by a delimiter. Let’s consider the below example

String Input: ‘ORACLE, SQL SERVER, DB2, MYSQL, SYBASE’

I have to use this string to see if these values exist in the table.

One possible way is to build a dynamic SQL. Just pad single quotes before and after each value in the string and build a SQL with the IN clause. But I want to do it without a dynamic SQL. How is this possible?

I came up with this SQL which splits this string and gives me the values in the string as rows.

Here is the SQL…

SQL> var a varchar2(100)

SQL> exec :a := 'ORACLE, SQL SERVER, DB2, MYSQL, SYBASE'

PL/SQL procedure successfully completed.

SQL>  SELECT TRIM(SUBSTR(:a, INSTR(','||:a,',',1,level), INSTR(:a||',',',',1,level) - INSTR(','||:a,',',1,level))) my_list
  2        FROM dual, (SELECT length(replace(translate(lower(:a)||',','abcdefghijklmnopqrstuvwxyz1234567890',' '),' ','')) cnt
  3                                    FROM dual)
  4  CONNECT BY level <= cnt
  5  /

MY_LIST
--------------------------------------------------------------------------------
ORACLE
SQL SERVER
DB2
MYSQL
SYBASE

So what have I done here? Let me split it into steps for your understanding.

1. I am identifying the possible letters that would appear in the values and translating them into blank space. Following command does that for me.

SQL> SELECT translate(lower(:a)||',','abcdefghijklmnopqrstuvwxyz1234567890',' ') val
   2       FROM dual;

VAL
--------------------------------------------------------------------------------
 ,  , , ,  ,

Now you can see I don’t have anything other than comma (The delimiter).

2. Now I am going to remove all the blank space from the output that I got in step 1.

SQL> SELECT replace(translate(lower(:a)||',','abcdefghijklmnopqrstuvwxyz1234567890',' '),' ','') val         
   2      FROM dual;

VAL
--------------------------------------------------------------------------------
,,,,,

Now the length of the output will give me the total number of values in the list. Note that I have padded a extra comma to the string as the number of delimiter is always less than the number of values by one.

3. So the third step is obviously to count the number of comma in the above input and determine total number of values in the string list.

SQL> SELECT length(replace(translate(lower(:a)||',','abcdefghijklmnopqrstuvwxyz1234567890',' '),' ','')) cnt
  2       FROM dual
  3  /

       CNT
----------
            5

4. Next step is to generate the string list for 5 times in rows using the CONNECT BY clause.

SQL> SELECT level, :a my_list
  2       FROM dual, (SELECT length(replace(translate(lower(:a)||',','abcdefghijklmnopqrstuvwxyz1234567890',' '),' ','')) cnt
  3                               FROM dual)
  4  CONNECT BY level <= cnt
  5  /

     LEVEL MY_LIST
---------- ---------------------------------------------------------------------------------------------------------------
         1 ORACLE, SQL SERVER, DB2, MYSQL, SYBASE
         2 ORACLE, SQL SERVER, DB2, MYSQL, SYBASE
         3 ORACLE, SQL SERVER, DB2, MYSQL, SYBASE
         4 ORACLE, SQL SERVER, DB2, MYSQL, SYBASE
         5 ORACLE, SQL SERVER, DB2, MYSQL, SYBASE

5. Now its simple with the help of SUBSTR and INSTR I can split each values as shown below.

SQL>  SELECT TRIM(SUBSTR(:a, INSTR(','||:a,',',1,level), INSTR(:a||',',',',1,level) - INSTR(',':a,',',1,level))) my_list
  2        FROM dual, (SELECT length(replace(translate(lower(:a)',','abcdefghijklmnopqrstuvwxyz1234567890',' '),' ','')) cnt
  3                            FROM dual)
  4  CONNECT BY level <= cnt
  5  /

MY_LIST
--------------------------------------------------------------------------------
ORACLE
SQL SERVER
DB2
MYSQL
SYBASE

So now I can use this SQL to convert any value list into rows and use in my SQL statements.

And if you can use Regular expression things can be much more easier.

Check this out.

SQL>  SELECT trim(regexp_substr(:a,'[^,]+', 1, level)) my_list
  2     FROM dual
  3  CONNECT BY level <= NVL( LENGTH( REGEXP_REPLACE( :a, '[^,]+', NULL ) ), 0 ) + 1
  4  /

MY_LIST
--------------------------------------------------------------------------------
ORACLE
SQL SERVER
DB2
MYSQL
SYBASE

NOTE: This might need have to be customized based on the given requirement.

Monday, June 2, 2008

Code reusability in oracle pl/sql

Again today when checking out tom’s blog I lead me to this post in Tayler Muth’s Blog. This sounds to be an interesting topic. So I thought for a while and this is what came up to my mind…

Code reusability in oracle pl/sql

First to start with pl/sql is procedural language which is not object oriented. Once said that it implies that this language is not familiar with the concept of inheritance. I have been using pl/sql for quite some time. It’s a great programming language. It would be even better if it had the class concept in it.

Class concept would come handy when we want to build a framework which can be utilized over and over.

Currently I am working on a masking utility. We have different masking algorithms say…

Algorithm 1, Algorithm 2, Algorithm 3 etc.

The general framework of this masking is something like this.

1. Get the field from user which needs to be masked.
2. Get the master table for that field.
3. Load the distinct data for the field from master table into a staging table
4. Apply your masking logic on the staging table.
5. Find all the transaction tables that have this field in it.
6. Using the staging table update all the transaction table.

Here this is a very good candidate for class concept where I can have a parent class which will have the entire frame work of what I have given in six steps. And I can have one child class for each algorithm and in each child class I can just over write the step four where I can apply my masking algorithm. But unfortunately we don’t have it. Hence I went for package, Use some CASE statements and hooked in my algorithms and made it as optimal as possible. I basically hate writing lot of code. If Oracle can introduce OOP(Object Oriented Programming) in PL/SQL it would be of great use and even more flexible.