It is important that your production Oracle server recovers in the minimum amount of time after an unexpected shutdown. The Oracle architecture is robust and will prevent loss of committed transactions barring a media failure. If there is no media failure, instance recovery is assured. However, as a database administrator, what you really want is a fast recovery. The faster the database is back online, the sooner users will stop calling, emailing and dropping by to ask when the database will be fixed. That's where tuning Oracle instance recovery comes in.
Instance and crash recovery is the automatic application of redo log records to data blocks after a crash or system failure. Whenever a clean shutdown of the database is performed, any changes in the instance memory that have not yet made it to disk will be written out during the system checkpoint. Some of these changes in memory may include committed transactions that have not yet been written out to the data files. Oracle does write every transaction synchronously to the redo log buffer, which is then written to the online redo logs. The contents of the redo logs will include both committed and uncommitted transactions. When a database is shut down with the abort option, a single instance database crashes or if all instances of an Oracle Real Application Clusters configuration crash, then no system checkpoint will be issued. Because the data files may have some uncommitted transactions in place and some committed transactions not in place, crash recovery must be performed on the next startup to make the database consistent.
Oracle instance recovery happens as a two-step process: cache recovery followed by transaction recovery.
During cache recovery (also known as "rolling forward"), Oracle will apply all committed and uncommitted changes in the redo log files to the affected data blocks. The amount of redo data that must be applied to the data files is proportional to the rate of change for the database and the time between checkpoints.
Transaction recovery happens after all of the redo log changes have been applied to the data files. Any transactions that were not committed at the time of the crash must be rolled back. For this operation, the database uses undo information to back out the uncommitted changes.
In terms of tuning Oracle instance recovery, the cache recovery operation is crucial. Once cache recovery has been completed, the database can be opened. The transaction recovery operation can take place without impacting system availability. Anything that makes cache recovery faster will increase database availability.
Oracle instance recovery using checkpoints
When Oracle performs a checkpoint, it records the highest system change number for which all data blocks less than or equal to that SCN are known to be written out to the data files. In the event of an instance failure, the most recent checkpoint SCN is the starting point for cache recovery. Only redo records with changes greater than that SCN must be applied. How long it will take to perform instance recovery is determined by the count of data blocks with changes higher than that SCN and how many redo log blocks must be read to locate those changes. If the database performs checkpoints more frequently, then Oracle will write dirty buffers to the data files more often. If a cache recovery is required due to an instance failure, fewer redo blocks will have to be applied to the data files. That said, frequent checkpointing in a system that is heavily updated can result in reduced database performance. Degrading system performance during normal operations to reduce recovery time in the event of an exception is a poor tradeoff.
More on Oracle backup and recovery
Find out how to copy an Oracle database
Backup and recovery concepts in Oracle Database 11g
Using RMAN to improve recovery
The Oracle Fast-Start Fault Recovery feature is designed to reduce the time required for cache recovery and make the time more predictable without incurring a significant performance hit. It works to limit the number of dirty buffers and the number of redo records generated between the most recent redo record and the last checkpoint.
Conventional checkpointing in Oracle is event-driven and performs bulk writes when a checkpoint occurs. When using the Fast-Start architecture, checkpointing occurs incrementally. Each Database Writer process periodically writes buffers to disk to advance the checkpoint position. This results in smaller checkpoints that eliminate the I/O spikes common with conventional checkpointing. The feature is enabled by setting the FAST_START_MTTR_TARGET initialization parameter to a non-zero value. This sets a target for the expected mean time for Oracle instance recovery. The parameter should be set to the number of seconds (0 to 3600) it should take to start up the instance and perform cache recovery. Once the parameter is set, Oracle manages incremental checkpoints in an attempt to meet that target.
This was first published in April 2013