While it is true that data storage prices have fallen dramatically over the last decade, Oracle data compression
has far more appealing benefits than simply saving on disk storage cost. Because data itself can be highly compressed, information can be fetched off of the disk devices with less physical I/O, which radically improves query performance under certain conditions.
Please note that there is common misconception that 11g table compression decompresses data while reading and holds it in the uncompressed form in the cache.
That is not quite correct since one of the salient features of the database table compression is that we do not have to uncompress the data before reading it and the data stays in the compressed form even in the cache.
As a result, Oracle table compression not only helps customers save disk space, it also helps to increase cache efficiency since more blocks can now fit in the memory.
According to the Oracle whitepaper on 11g data compression, the CPU overhead for the compress/decompress operations will be minimal. More importantly, Oracle11g data compression will be a godsend for shops that are constrained by federal regulation to archive their audit trails (e.g., HIPAA, SOX). But best of all, because Oracle 11g compression makes storage up to 3x cheaper, solid-state flash drives are far less expensive, which allows Oracle shops to forever eliminate the high costs of platter-disk I/O and enjoy data access speeds up to 300x faster.
Let's take a closer look at how one would implement Oracle 11g data compression in order to achieve optimal results. It is expected that Oracle data compression will eventually become the default for Oracle systems, much like the move from dictionary managed tablespaces to locally managed tablespaces. Eventually, this data compression may become ubiquitous, a general part of the Oracle database management engine, but its important for the Oracle database administrator to understand the ramifications of data compression and have the ability to turn-off compression at-will.
For example, super small servers (read PC's), may not possess enough horsepower to absorb the small (but measurable) overhead of the compress/decompress routines. Remember, there is always a tradeoff between these costs vs. the saving on disk storage and allowing for information to be retrieved with the minimum amount of physical disk I/O.
Oracle Compression Overview
Over the past decade, Oracle has introduced several types of compression so we must be careful to distinguish between the disparate tools. The data compression in 11g is threshold-based and allows Oracle to honor the freelist unlink threshold (PCTFREE) for the compress rows, thereby allowing more rows per data block.
- Simple index compression in Oracle 8i
- Table-level compression in Oracle9ir2
- LOB compression (utl_compress) in Oracle 10g
- Row-level compression in Oracle 11g (even for materialized views)
The historical external compression (blocks are compressed outbound and uncompressed before presenting to the database) are far simpler because all index objects are treated the same way, whereas with the 11g table compression, a data block may contain both compressed and uncompressed row data.
For the official details on Oracle 11g table data compression, see the Oracle 11g concepts documentation. Oracle says that their software will perform a native disk read on the data block, only decompressing the row data after the physical I/O has been completed.
Within the data buffers, the fully uncompressed version of the data remains, even though the information remains compressed on the disk data blocks themselves. This leads to a discrepancy between the size of information on the data blocks and the size of the information within the data buffers. Upon applying Oracle data compression, people will find that far more rows will fit on a data block of a given size, but there is still no impact on the data base management system from the point of view of the SGA (system global area).
Because the decompression routine calls upon block fetch, the Oracle data buffers remain largely unchanged while the data blocks themselves tend to have a lot more data on them. This Oracle 11g data compression whitepaper describes the data compression algorithm:
Compressed blocks contain a structure called a symbol table that maintains compression metadata. When a block is compressed, duplicate values are eliminated by first adding a single copy of the duplicate value to the symbol table. Each duplicate value is then replaced by a short reference to the appropriate entry in the symbol table.
Through this innovative design, compressed data is self-contained within the database block as the metadata used to translate compressed data into its original state is contained within the block.
Oracle 11g compression (Source: Oracle Corporation)
In today's Oracle database management systems, physical disk I/O remains one of the foremost bottlenecks. Even at relatively fast speeds of 10 milliseconds, many data-intensive Oracle applications can still choke on I/O by having disk enqueues, Oracle block read tasks waiting to pull information from the spinning platters. Data compression is certainly useful for reducing the amount of physical disk I/O but there are some caveats that need to be followed by the Oracle database administrator.
Costs and Benefits of 11g compression
One of the exciting new features of Oracle 11g is the new inline data compression utility that promises these benefits:
- Up to a 3x disk savings - Depending on the nature of your data, Oracle compression will result in huge savings on disk space.
- Cheaper solid-state disk - Because compressed tables reside on fewer disk blocks, shops that might not otherwise be able to afford solid-state flash disks can now enjoy I/O speeds up to 300x faster than platter disk.
- Faster full scan/range scan operations - Because tables will reside on less data blocks, full table scans and index range scans can retrieve the rows with less disk I/O.
- Reduced network traffic - Because the data blocks are compressed/decompressed only within Oracle, the external network packets will be significantly smaller.
The overhead of 11g compression?
Remember, physical disk I/O against disk platters has become the major system bottleneck as the speed of processors increase. Until the widespread adoption of RAM disk (solid state disk), we can see this type of data compression being widely used in order to reduce the amount of physical disk I/O against Oracle systems.
The internal machinations of Oracle have always been a closely-guarded secret, Oracle's internal software, their bread-and-butter "edge" that gives Oracle such a huge competitive advantage over their competition. Because Oracle withholds many internal details, we must we must discover the internals of 11g compression with real-world observations and conjecture. First, Oracle hires some of the brightest software engineers in the world (graduates of prestigious colleges like MIT), and it's likely that overhead will be minimized by doing the data compress/uncompress only once, at disk I/O time, and kept in decompressed form somewhere within the RAM data buffers.
It's clear that 11g data compression offers these huge benefits, but the exact overhead costs remain unknown. Oracle explains that there new 11g data compression algorithm: "The algorithm works by eliminating duplicate values within a database block, even across multiple columns. Compressed blocks contain a structure called a symbol table that maintains compression metadata.
When a block is compressed, duplicate values are eliminated by first adding a single copy of the duplicate value to the symbol table. Each duplicate value is then replaced by a short reference to the appropriate entry in the symbol table."
A 2005 benchmark test on Oracle table compression indicated that the 11g data compression is even faster than this 2005 version. Among the findings is the important suggestion that using Oracle table compression may actually improve the performance of your Oracle database:
"The reduction of disk space using Oracle table compression can be significantly higher than standard compression algorithms, because it is optimized for relational data. It has virtually no negative impact on the performance of queries against compressed data; in fact, it may have a significant positive impact on queries accessing large amounts of data, as well as on data management operations like backup and recovery."
This compression paper also suggestion that using a large blocksize may benefit Oracle databases where rows contain common redundant values:
"Table compression can significantly reduce disk and buffer cache requirements for database tables. Since the compression algorithm utilizes data redundancy to compress data at a block level, the higher the data redundancy is within one block, the larger the benefits of compression are."
The article also cites evidence that Oracle table compression can reduce the time required to perform large-table full-able scans by half:
"The fts of the non-compressed table takes about 12s while the fts of the compressed table takes only about 6s."
Some unknown issues (as of September 2007) with implementing Oracle11g data compression include the amount of overhead. The compress/decompress operations are computationally intensive but super small (probably measured in microseconds). This CPU overhead might be significantly measurable, but we can assume that the overhead will be the same (or smaller) than data compression in legacy databases (with the possible exception of PC-based Oracle databases). In a perfect implementation, incoming data would only be decompressed once (at read time) and the uncompressed copy of the disk block would reside in RAM, thereby minimizing changes to the Oracle kernel code. The overhead on DML must involve these operations:
- Overhead at DML time - Whenever a SQL update, insert of delete changes a data block in RAM, Oracle must determine if the data block should be unlinked from the freelist (this threshold is defined by the PCTFREE parameter).
- Compression on write - An outbound data block must be compressed to fit into it's tertiary block size (as defined by db_block_size and the tablespace blocksize keyword). For example, an uncompressed block in RAM might occupy up to 96k in RAM and be compressed into it's tertiary disk blocksize of 32k upon a physical disk write.
- Decompress on read - At physical read time, incoming disk blocks must be expanded once and stored in the RAM data buffer. The exact mechanism for this expansion is not published in the Oracle11g documentation, but it's most likely a block versioning scheme similar to the one used for maintaining read consistency.
- Increased likelihood of disk contention - Because the data is tightly compressed on the data blocks, more rows can be stored, thus increasing the possibility of "hot" blocks on disk. Of course, using large data buffers and/or solid-state disk (RAM-SAN) will alleviate this issue.
Oracle's multi-state compression
While the "alter table" and "alter tablespace" clauses support changing the compression options, we would expect that Oracle would feel obligated to change all objects to match their new compression attributes. That is not the case, and the 11g compression docs note that a table may have multi-state rows, some compressed and others expanded:
You can alter the compression attribute for a table (or a partition or tablespace), and the change only applies to new data going into that table. As a result, a single table or partition may contain some compressed blocks and some regular blocks. This guarantees that data size will not increase as a result of compression; in cases where compression could increase the size of a block, it is not applied to that block.
Many shops do not understand that the disk savings are only a trivial benefit of data compression and the real benefits are a reduction in disk I/O and faster scan operations. Also, as the industry moves toward solid-state flash disks as replacements for platter disk, Oracle compression will make solid state Oracle more affordable.
Oracle 11g multi-state blocks (Source: Oracle Corporation)
Without implementing this revolutionary "partial" row compression, making a table-wide or tablespace-wide compression change would require a massive update to blocks within the target tablespace. The 11g compression docs note that when changing to/from global compression features, the risk averse DBA would choose to rebuild the table or tablespace from scratch:
Existing data in the database can also be compressed by moving it into compressed form through ALTER TABLE and MOVE statements. This operation takes an exclusive lock on the table, and therefore prevents any updates and loads until it completes. If this is not acceptable, the Oracle Database online redefinition utility (the DBMS_REDEFINITION PL/SQL package) can be used.