Myth: Most discriminating elements should be first

Tom Kyte elaborates on the common myth that in indexes, the most discriminating elements should be first.

This excerpt is from Tom Kyte's new book Expert Oracle Database Architecture: 9i and 10g Programming Techniques

and Solutions, published by Apress in September 2005. Tom Kyte is Vice President of the Core Technologies Group at Oracle and has been using Oracle since 1988. To see other frequently asked questions and myths about indexes, click here.

This seems like common sense. If you are going to create an index on the columns C1 and C2 in a table with 100,000 rows, and you find C1 has 100,000 distinct values and C2 has 25,000 distinct values, you would want to create the index on T(C1,C2). This means that C1 should be first, which is the "common sense" approach. The fact is, when comparing vectors of data (consider C1, C2 to be a vector), it doesn't matter which you put first. Consider the following example. We will create a table based on ALL_OBJECTS and an index on the OWNER, OBJECT_TYPE, and OBJECT_NAME columns (least discriminating to most discriminating) and also on OBJECT_NAME, OBJECT_TYPE, and OWNER:

ops$tkyte@ORA10GR1> create table t
   2 as
   3 select * from all_objects;
Table created.

ops$tkyte@ORA10GR1> create index t_idx_1 on t(owner,object_type,object_name);
Index created.

ops$tkyte@ORA10GR1> create index t_idx_2 on t(object_name,object_type,owner);
Index created.

ops$tkyte@ORA10GR1> select count(distinct owner), count(distinct object_type),
   2 count(distinct object_name ), count(*)
   3 from t;

DISTINCTOWNER DISTINCTOBJECT_TYPE DISTINCTOBJECT_NAME COUNT(*)
------------- ------------------- ------------------- --------
           28             36            28537           48243

Now, to show that neither is more efficient space-wise, we'll measure their space utilization:

ops$tkyte@ORA10GR1> analyze index t_idx_1 validate structure;
Index analyzed.

ops$tkyte@ORA10GR1> select btree_space, pct_used, opt_cmpr_count, opt_cmpr_pctsave
2 from index_stats;

BTREE_SPACE PCT OPT_CMPR_COUNT OPT_CMPR_PCTSAVE
----------- ------ -------------- ----------------
 2702744     89.0             2         28

ops$tkyte@ORA10GR1> analyze index t_idx_2 validate structure;
Index analyzed.

ops$tkyte@ORA10GR1> select btree_space, pct_used, opt_cmpr_count, opt_cmpr_pctsave
2 from index_stats;

BTREE_SPACE PCT OPT_CMPR_COUNT OPT_CMPR_PCTSAVE
----------- ------ -------------- ----------------
 2702744      89.0            1         13

They use exactly the same amount of space, down to the byte -- there are no differences there. However, the first index is a lot more compressible if we use index key compression, as evidenced by the OPT_CMP_PCTSAVE value. There is an argument for arranging the columns in the index in order from the least discriminating to the most discriminating. Now let's see how they perform, to determine if either index is generally more efficient than the other. To test this, we'll use a PL/SQL block with hinted queries (so as to use one index or the other):

ops$tkyte@ORA10GR1> alter session set sql_trace=true;
Session altered.

ops$tkyte@ORA10GR1> declare
   2            cnt int;
   3    begin
   4        for x in ( select /*+FULL(t)*/ owner, object_type, object_name from t )
   5        loop
   6             select /*+ INDEX( t t_idx_1 ) */ count(*) into cnt
   7                 from t
   8               where object_name = x.object_name
   9                    and object_type = x.object_type
 10                   and owner = x.owner;
 11
 12              select /*+ INDEX( t t_idx_2 ) */ count(*) into cnt
 13                  from t
 14                where object_name = x.object_name
 15                    and object_type = x.object_type
 16                    and owner = x.owner;
 17       end loop;
 18   end;
 19   /
PL/SQL procedure successfully completed.

These queries read every single row in the table by means of the index. The TKPROF report shows us the following:

SELECT /*+ INDEX( t t_idx_1 ) */ COUNT(*) FROM T
WHERE OBJECT_NAME = :B3 AND OBJECT_TYPE = :B2 AND OWNER = :B1
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 48243 10.63 10.78 0 0 0 0
Fetch 48243 1.90 1.77 0 145133 0 48243
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 96487 12.53 12.55 0 145133 0 48243
Rows Row Source Operation
------- ---------------------------------------------------
48243 SORT AGGREGATE (cr=145133 pr=0 pw=0 time=2334197 us)
57879 INDEX RANGE SCAN T_IDX_1 (cr=145133 pr=0 pw=0 time=1440672 us)(object...
********************************************************************************
SELECT /*+ INDEX( t t_idx_2 ) */ COUNT(*) FROM T
WHERE OBJECT_NAME = :B3 AND OBJECT_TYPE = :B2 AND OWNER = :B1
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 48243 11.00 10.78 0 0 0 0
Fetch 48243 1.87 2.10 0 145168 0 48243
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 96487 12.87 12.88 0 145168 0 48243
Rows Row Source Operation
------- ---------------------------------------------------
48243 SORT AGGREGATE (cr=145168 pr=0 pw=0 time=2251857 us)
57879 INDEX RANGE SCAN T_IDX_2 (cr=145168 pr=0 pw=0 time=1382547 us)(object...

They processed the same exact number of rows and very similar numbers of blocks (minor variations coming from accidental ordering of rows in the table and consequential optimizations made by Oracle), used equivalent amounts of CPU time, and ran in about the same elapsed time (run this same test again and the CPU and ELAPSED numbers will be a little different, but on average they will be the same). There are no inherent efficiencies to be gained by placing the columns in order of how discriminating they are, and as stated previously, with index key compression there is an argument for putting the least selective first. If you run the preceding example with COMPRESS 2 on the indexes, you'll find that the first index will perform about two-thirds the I/O of the second, given the nature of the query in this case.

However, the fact is that the decision to put column C1 before C2 must be driven by how the index is used. If you have lots of queries like the following:

select * from t where c1 = :x and c2 = :y;
select * from t where c2 = :y;
it makes more sense to place the index on T(C2,C1). This single index could be used by either of the queries. Additionally, using index key compression (which we looked at with regard to IOTs and will examine further later), we can build a smaller index if C2 is first. This is because each value of C2 repeats itself on average four times in the index. If C1 and C2 are both, on average, 10 bytes in length, the index entries for this index would nominally be 2,000,000 bytes (100,000 x 20). Using index key compression on (C2, C1), we could shrink this index to 1,250,000 (100,000 x 12.5), since three out of four repetitions of C2 could be suppressed.

In Oracle 5 (yes, version 5!), there was an argument for placing the most selective columns first in an index. It had to do with the way version 5 implemented index compression (not the same as index key compression). This feature was removed in version 6 with the addition of row-level locking. Since then, it is not true that putting the most discriminating entries first in the index will make the index smaller or more efficient. It seems like it will, but it will not. With index key compression, there is a compelling argument to go the other way since it can make the index smaller. However, it should be driven by how you use the index, as previously stated.

To see other frequently asked questions and myths about indexes, click here.


This was first published in October 2005

Dig deeper on Oracle DBA jobs, training and certification

Pro+

Features

Enjoy the benefits of Pro+ membership, learn more and join.

0 comments

Oldest 

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

SearchDataManagement

SearchBusinessAnalytics

SearchSAP

SearchSQLServer

TheServerSide

SearchDataCenter

SearchContentManagement

SearchFinancialApplications

Close