This excerpt is from Tom Kyte's new book Expert Oracle Database Architecture: 9i and 10g Programming Techniques and Solutions, published by Apress in September 2005. Tom Kyte is Vice President of the Core Technologies Group at Oracle and has been using Oracle since 1988. To see other frequently asked questions and myths about indexes, click here.
This seems like common sense. If you are going to create an index on the columns C1 and C2
in a table with 100,000 rows, and you find C1 has 100,000 distinct values and C2 has 25,000 distinct
values, you would want to create the index on T(C1,C2). This means that C1 should be
first, which is the "common sense" approach. The fact is, when comparing vectors of data
(consider C1, C2 to be a vector), it doesn't matter which you put first. Consider the following
example. We will create a table based on ALL_OBJECTS and an index on the OWNER, OBJECT_TYPE,
and OBJECT_NAME columns (least discriminating to most discriminating) and also on
OBJECT_NAME, OBJECT_TYPE, and OWNER:
ops$tkyte@ORA10GR1> create table t
2 as
3 select * from all_objects;
Table created.
ops$tkyte@ORA10GR1> create index t_idx_1 on t(owner,object_type,object_name);
Index created.
ops$tkyte@ORA10GR1> create index t_idx_2 on t(object_name,object_type,owner);
Index created.
ops$tkyte@ORA10GR1> select count(distinct owner), count(distinct object_type),
2 count(distinct object_name ), count(*)
3 from t;
DISTINCTOWNER DISTINCTOBJECT_TYPE DISTINCTOBJECT_NAME COUNT(*)
------------- ------------------- ------------------- --------
28 36 28537 48243
Now, to show that neither is more efficient space-wise, we'll measure their space
utilization:
ops$tkyte@ORA10GR1> analyze index t_idx_1 validate structure;
Index analyzed.
ops$tkyte@ORA10GR1> select btree_space, pct_used, opt_cmpr_count, opt_cmpr_pctsave
2 from index_stats;
BTREE_SPACE PCT OPT_CMPR_COUNT OPT_CMPR_PCTSAVE
----------- ------ -------------- ----------------
2702744 89.0 2 28
ops$tkyte@ORA10GR1> analyze index t_idx_2 validate structure;
Index analyzed.
ops$tkyte@ORA10GR1> select btree_space, pct_used, opt_cmpr_count, opt_cmpr_pctsave
2 from index_stats;
BTREE_SPACE PCT OPT_CMPR_COUNT OPT_CMPR_PCTSAVE
----------- ------ -------------- ----------------
2702744 89.0 1 13
They use exactly the same amount of space, down to the byte -- there are no differences
there. However, the first index is a lot more compressible if we use index key compression, as
evidenced by the OPT_CMP_PCTSAVE value. There is an argument for arranging the columns in
the index in order from the least discriminating to the most discriminating. Now let's see how
they perform, to determine if either index is generally more efficient than the other. To test
this, we'll use a PL/SQL block with hinted queries (so as to use one index or the other):
ops$tkyte@ORA10GR1> alter session set sql_trace=true;
Session altered.
ops$tkyte@ORA10GR1> declare
2 cnt int;
3 begin
4 for x in ( select /*+FULL(t)*/ owner, object_type, object_name from t )
5 loop
6 select /*+ INDEX( t t_idx_1 ) */ count(*) into cnt
7 from t
8 where object_name = x.object_name
9 and object_type = x.object_type
10 and owner = x.owner;
11
12 select /*+ INDEX( t t_idx_2 ) */ count(*) into cnt
13 from t
14 where object_name = x.object_name
15 and object_type = x.object_type
16 and owner = x.owner;
17 end loop;
18 end;
19 /
PL/SQL procedure successfully completed.
These queries read every single row in the table by means of the index. The TKPROF report
shows us the following:
SELECT /*+ INDEX( t t_idx_1 ) */ COUNT(*) FROM T
WHERE OBJECT_NAME = :B3 AND OBJECT_TYPE = :B2 AND OWNER = :B1
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 48243 10.63 10.78 0 0 0 0
Fetch 48243 1.90 1.77 0 145133 0 48243
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 96487 12.53 12.55 0 145133 0 48243
Rows Row Source Operation
------- ---------------------------------------------------
48243 SORT AGGREGATE (cr=145133 pr=0 pw=0 time=2334197 us)
57879 INDEX RANGE SCAN T_IDX_1 (cr=145133 pr=0 pw=0 time=1440672 us)(object...
********************************************************************************
SELECT /*+ INDEX( t t_idx_2 ) */ COUNT(*) FROM T
WHERE OBJECT_NAME = :B3 AND OBJECT_TYPE = :B2 AND OWNER = :B1
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 48243 11.00 10.78 0 0 0 0
Fetch 48243 1.87 2.10 0 145168 0 48243
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 96487 12.87 12.88 0 145168 0 48243
Rows Row Source Operation
------- ---------------------------------------------------
48243 SORT AGGREGATE (cr=145168 pr=0 pw=0 time=2251857 us)
57879 INDEX RANGE SCAN T_IDX_2 (cr=145168 pr=0 pw=0 time=1382547 us)(object...
They processed the same exact number of rows and very similar numbers of blocks
(minor variations coming from accidental ordering of rows in the table and consequential
optimizations made by Oracle), used equivalent amounts of CPU time, and ran in about the
same elapsed time (run this same test again and the CPU and ELAPSED numbers will be a little
different, but on average they will be the same). There are no inherent efficiencies to be gained
by placing the columns in order of how discriminating they are, and as stated previously, with
index key compression there is an argument for putting the least selective first. If you run the
preceding example with COMPRESS 2 on the indexes, you'll find that the first index will perform
about two-thirds the I/O of the second, given the nature of the query in this case.
However, the fact is that the decision to put column C1 before C2 must be driven by how
the index is used. If you have lots of queries like the following:
select * from t where c1 = :x and c2 = :y;
select * from t where c2 = :y;
it makes more sense to place the index on T(C2,C1). This single index could be used by either
of the queries. Additionally, using index key compression (which we looked at with regard to
IOTs and will examine further later), we can build a smaller index if C2 is first. This is because
each value of C2 repeats itself on average four times in the index. If C1 and C2 are both, on average,
10 bytes in length, the index entries for this index would nominally be 2,000,000 bytes
(100,000 x 20). Using index key compression on (C2, C1), we could shrink this index to
1,250,000 (100,000 x 12.5), since three out of four repetitions of C2 could be suppressed.
In Oracle 5 (yes, version 5!), there was an argument for placing the most selective columns
first in an index. It had to do with the way version 5 implemented index compression (not the
same as index key compression). This feature was removed in version 6 with the addition of
row-level locking. Since then, it is not true that putting the most discriminating entries first in
the index will make the index smaller or more efficient. It seems like it will, but it will not. With
index key compression, there is a compelling argument to go the other way since it can make
the index smaller. However, it should be driven by how you use the index, as previously stated.
To see other frequently asked questions and myths about indexes, click here.