The following is the final part of a 12-part series on Oracle10g CBO internals and SQL tuning optimization. Each tip is excerpted from the not-yet-released Rampant TechPress book, "Oracle SQL and index internals," by Kimberly Floss. Check back to the main series page for upcoming installments.
Histograms
In some cases, the distribution of values within a column of a table will affect the optimizer's decision to use an index vs. performing a full-table scan. This scenario occurs when the value with a where clause does not have an equivalent distribution of values, making a full-table scan cheaper than index access in some cases. The problem is the optimizer can't tell when it is using a value with a few rows, and when it is using a value with a large number of rows.
A column histogram should only be created when we have data skew exists or suspected. In the real world, that happens rarely, and one of the most common mistakes with the optimizer is the unnecessary introduction of histograms into optimizer statistics. The histograms signals the optimizer that the column is not linearly distributed, and the optimizer will peek into the literal value in the SQL where clause, and compare that value to the histogram buckets in the histogram statistics.
While they are used to make a yes-or-no decision about the use of an index to access the table, histograms are most commonly used to predict the size of the intermediate result set from a multi-way table join.
For example, assume that we have a five-way table join whose result set will be only 10 rows. Oracle will want to join the tables together in such a way as to make the result set (cardinality) of the first join as small as possible. By carrying less baggage in the intermediate result sets, the query will run faster. To minimize intermediate results, the optimizer attempts to estimate the cardinality of each result set during the parse phase of SQL execution.
To continue reading for free, register below or login
To read more you must become a member of SearchOracle.com
');
// -->

Having histograms on skewed columns will greatly aid the optimizer in making a proper decision. (Remember, you can create a histogram even if the column does not have an index and does not participate as a join key.)
Because a complex schema might have tens of thousands of columns, it is impractical to evaluate each column for skew and thus Oracle provides an automated method for building histograms as part of the dbms_stats utility. By using the method_opt=>'for all columns size skewonly' option of dbms_stats, you can direct Oracle to automatically create histograms for those columns whose values are heavily skewed.
As a general rule, histograms are used to predict the cardinality and the number of rows returned in the result set. For example, assume that we have a product_type index and 70% of the values are for the HARDWARE type. Whenever SQL with where product_type='HARDWARE' is specified, a full-table scan is the fastest execution plan, while a query with where product_type='SOFTWARE' would be fastest using index access.
Because histograms add additional overhead to the parsing phase of SQL, you should avoid them unless they are required for a faster optimizer execution plan. But, there are several conditions where creating histograms is advised:
So how do you find those columns that are appropriate for histograms? There is a feature in dbms_stats that provides for the ability to automatically look for columns that should have histograms, and create the histograms. Multi-bucket histograms add a huge parsing overhead to SQL statements, and histograms should only be used when the SQL will choose a different execution plan based upon the column value.
To aid in intelligent histogram generation, Oracle uses the method_opt parameter of dbms_stats. There are also important new options within the method_opt clause, namely skewonly and auto (and others).
The first is the "skewonly" option, which is very time-intensive because it examines the distribution of values for every column within every index. If dbms_stats discovers an index with columns that are unevenly distributed, it will create histograms for that index to aid the cost-based SQL optimizer in making a decision about index vs. full-table scan access. For example, if an index has one column that is in 50% of the rows, a full-table scan is faster than an index scan to retrieve these rows.
Histograms are also used with SQL that has bind variables and SQL with cursor_sharing enabled. In these cases, the optimizer determines if the column value could affect the execution plan, and if so, replace the bind variable with a literal and performs a hard parse.
The auto option is used when monitoring is implemented (alter table xxx monitoring) and creates histograms based upon data distribution and the manner in which the column is accessed by the application (e.g., the workload on the column as determined by monitoring). Using method_opt=>'auto' is similar to using the gather auto in the option parameter of dbms_stats:
Finding the poorly running SQL
While complex queries may have extremely complex execution plans, most Oracle professionals must tune SQL with the following problems:
The v$sql_plan view can help locate SQL tuning opportunities. When searching for tuning opportunities, start by interrogating the v$sql_plan view to find these large-table full-table scans.
This is when business knowledge comes in handy. Very frequently, and I'm talking from experience, queries are monstrous because the developer doesn't understand what he goal is and what the data actually means. The old saying that there is no substitute for experience is confirmed here again.
Go to the main series page.
About the author
Kimberly Floss is one of the most-respected Oracle database administrators in the U.S., and is president of the International Oracle Users Group (IOUG). With more than a decade of experience, Kimberly specializes in Oracle performance tuning and is a respected expert in SQL tuning techniques. She is an active member of the Chicago Oracle Users Group, and the Midwest Oracle Users Group, in addition to the IOUG. Kimberly Floss has over 15 years of experience in the information technology industry, with specific focus on relational database technology, including Oracle, DB2, Microsoft SQL Server and Sybase. She holds a bachelor's of science degree in computer information systems from Purdue University, specializing in systems analysis and design, and has an MBA with emphasis in management information systems from Loyola University.