Optimizing database performance, part 2: Denormalization and clustering

Author Craig Mullins continues his series about optimizing the performance of any database.

This tip is excerpted from Craig Mullins' new book Database administration: The complete guide to practices and

procedures. The first and third parts of this series are also available.

Denormalization

Another way to optimize the performance of database access is to denormalize the tables. In brief, denormalization, the opposite of normalization, is the process of putting one fact in many places. This speeds data retrieval at the expense of data modification. Denormalizing tables can be a good decision when a completely normalized design does not perform optimally.

The only reason to ever denormalize a relational database design is to enhance performance. As discussed elsewhere in "Database administration," you should consider the following options:

  • Prejoined tables — when the cost of joining is prohibitive.
  • Report table — when specialized critical reports are too costly to generate.
  • Mirror table — when tables are required concurrently by two types of environments.
  • Split tables — when distinct groups use different parts of a table.
  • Combined tables — to consolidate one-to-one or one-to-many relationships into a single table.
  • Speed table — to support hierarchies like bill-of-materials or reporting structures.
  • Physical denormalization — to take advantage of specific DBMS characteristics.

You might also consider

  • Storing redundant data in tables to reduce the number of table joins required.
  • Storing repeating groups in a row to reduce I/O and possibly disk space.
  • Storing derivable data to eliminate calculations and costly algorithms.

Clustering

A clustered table will store its rows physically on disk in order by a specified column or columns. Clustering usually is enforced by the DBMS with a clustering index. The clustering index forces table rows to be stored in ascending order by the indexed columns. The left-to-right order of the columns as defined in the index, defines the collating sequence for the clustered index. There can be only one clustering sequence per table (because physically the data can be stored in only one sequence).

Figure 11-2 demonstrates the difference between clustered and unclustered data and indexes; the clustered index is on top, the unclustered index is on the bottom. As you can see, the entries on the leaf pages of the top index are in sequential order — in o­her words, they are clustered. Clustering enhances the performance of queries that access data sequentially because fewer I/Os need to be issued to retrieve the same data.

Figure 11-2 Clustered and unclustered indexes

Depending on the DBMS, the data may not always be physically maintained in exact clustering sequence. When a clustering sequence has been defined for a table, the DBMS will act in one of two ways to enforce clustering:

  1. When new rows are inserted, the DBMS will physically maneuver data rows and pages to fit the new rows into the defined clustering sequence; or
  2. When new rows are inserted, the DBMS will try to place the data into the defined clustering sequence, but if space is not available on the required page the data may be placed elsewhere.

The DBA must learn how the DBMS maintains clustering. If the DBMS operates as in the second scenario, data may become unclustered over time and require reorganization. A detailed discussion of database reorganization appears later in this chapter. For now, though, back to our discussion of clustering.

Clustering tables that are accessed sequentially is good practice. In other words, clustered indexes are good for supporting range access, whereas unclustered indexes are better for supporting random access. Be sure to choose the clustering columns wisely. Use clustered indexes for the following situations:

  • Join columns, to optimize SQL joins where multiple rows match for one or both tables participating in the join
  • Foreign key columns because they are frequently involved in joins and the DBMS accesses foreign key values during declarative referential integrity checking
  • Predicates in a WHERE clause
  • Range columns
  • Columns that do not change often (reduces physically reclustering)
  • Columns that are frequently grouped or sorted in SQL statements

In general, the clustering sequence that aids the performance of the most commonly accessed predicates should be used to for clustering. When a table has multiple candidates for clustering, weigh the cost of sorting against the performance gained by clustering for each candidate key. As a rule of thumb, though, if the DBMS supports clustering, it is usually a good practice to define a clustering index for each table that is created (unless the table is very small).

Clustering is generally not recommended for primary key columns because the primary key is, by definition, unique. However, if ranges of rows frequently are selected and ordered by primary key value, a clustering index may be beneficial.

Page Splitting

When the DBMS has to accommodate inserts, and no space exists, it must create a new page within the database to store the new data. The process of creating new pages to store inserted data is called page splitting. A DBMS can perform two types of page splitting: normal page splits and monotonic page splits. Some DBMSs support both types of page splitting, while others support only one type. The DBA needs to know how the DBMS implements page splitting in order to optimize the database.

Figure 11-3 depicts a normal page split. To accomplish this, the DBMS performs the following tasks in sequence:

  1. Creates a new empty page in between the full page and the next
    page
  2. Takes half of the entries from the full page and moves them to the empty page
  3. Adjusts any internal pointers to both pages and inserts the row accordingly

Figure 11-3 Normal page splitting

A monotonic page split is a much simpler process, requiring only two steps. The DBMS

  • Creates a new page in between the full page and the next page
  • Inserts the new values into the fresh page

Monotonic page splits are useful when rows are being inserted in strictly ascending sequence. Typically, a DBMS that supports monotonic page splits will invoke it when a new row is added to the end of a page and the last addition was also to the end of the page.

When ascending rows are inserted and normal page splitting is used, a lot of space can be wasted because the DBMS will be creating half-full pages that never fill up. If the wrong type of page split is performed during database processing, wasted space will ensue, requiring the database object to be reorganized for performance.

Interleaving Data

When data from two tables is frequently joined, it can make sense to physically interleave the data into the same physical storage structure. Interleaving can be viewed as a specialized form of clustering (and, in fact, Oracle uses the term cluster to define interleaved data). Interleaving data is covered in Chapter 4 and is mentioned here as a performance tuning technique to consider.

This tip is excerpted from Craig Mullins' new book Database administration: The complete guide to practices and procedures. The first and third parts of this series are also available.

For more information


This was first published in October 2002

Dig deeper on Oracle database performance problems and tuning

Pro+

Features

Enjoy the benefits of Pro+ membership, learn more and join.

0 comments

Oldest 

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

SearchDataManagement

SearchBusinessAnalytics

SearchSAP

SearchSQLServer

TheServerSide

SearchDataCenter

SearchContentManagement

SearchFinancialApplications

Close