This is an excerpt from Chapter 20 of Oracle Database 10g OCP Certification All-In-One Exam Guide by Damir Bersinic and John Watson, copyright 2006 from Oracle Press, a division of McGraw-Hill. Click here to read the full chapter.
In this chapter you will learn how to
- Recover from loss of a controlfile
- Recover from loss of a redo log file
- Recover from loss of a system-critical datafile
- Recover from loss of a nonsystem-critical datafile
It is impossible to corrupt an Oracle database. The mechanism of instance recovery,
where redo and undo are used to return the database to a consistent state after an
instance failure, guarantees that. It is, however, possible to lose data following media
failure—if the DBA has not taken appropriate precautions. The precautions are simple:
to run the database in archivelog mode; to multiplex the controlfile, the online logfiles,
and the archive log files; and to back up datafiles and archive log files. Following
media failure, the backups and the archive logs can be used to recover the database up
to the point of the failure, without loss of even one committed row of data. But
whereas instance recovery is automatic—indeed, unavoidable—media recovery is a
manual process. This chapter will go through elementary recovery techniques. More
advanced techniques, applicable to more complex problems, will be covered in later
chapters.
Recovery structures and processes
Following media failure, there are different techniques for recovery, depending on
which files were damaged. The database consists of three file types: the controlfile,
the online redo log files, and the datafiles. Recovery from damage to the controlfile
or the online redo log files is a trivial exercise, provided that they were multiplexed.
Recovery from damage to one or more datafiles is more complex, but still
straightforward.
A...
To continue reading for free, register below or login
To read more you must become a member of SearchOracle.com
');
// -->

damaged controlfile can be replaced with a multiplexed copy or re-created with
a CREATE CONTROLFILE command. In extreme circumstances, it can be restored
from a backup, but this should never be necessary following media failure, if you have
followed a suitable multiplexing strategy.
A damaged online redo log file can be regenerated. Oracle provides an ALTER
DATABASE CLEAR LOGFILE GROUP # command (where # is the number of the group
with a damaged member), which will delete and re-create the members of a logfile group.
If the database is running in archivelog mode (and it should be), the logfile group must
have been archived before Oracle will permit execution of the clear logfile command.
This is because clearing an unarchived log file group would mean that the archive log
stream would be missing one logfile, and therefore that recovery would not be possible.
There is a variation on the command, ALTER DATABASE CLEAR UNARCHIVED
LOGFILE GROUP #, which will delete and re-create a logfile even if it has not been
successfully archived, but after executing this command it is absolutely vital to perform
a whole database backup.
A damaged datafile requires use of backups and archive logs. Following media
failure resulting in damage to a datafile, there are two options for recovery: complete
recovery, meaning no loss of data, and incomplete recovery, where you deliberately
lose work by stopping the recovery process before it has completed. Incomplete
recovery is an advanced procedure dealt with in Chapter 27. Complete recovery is
a two-stage process. First, the damaged file must be restored from a backup. Second,
the restored file must be recovered, by using redo information in the archive logs to
bring it forward in time until it is synchronized with the rest of the database.
EXAM TIP In the Oracle environment, "restore" means to replace a damaged
or missing file with a backup; "recover" means to synchronize the file with the
rest of the database by use of archive logs.
Since online redo logs are never backed up by RMAN, RMAN cannot be used to
recover from damage to them; repairing online logfiles damaged by media failure
can by done only with SQL*Plus, or through Database Control. The controlfile and
datafiles can be restored and recovered by RMAN; indeed, if you backed them up into
backup sets, RMAN is your only option.
To open a database, all the controlfile copies, at least one member of each online
logfile group, and all the online datafiles must be present and synchronized. If,
during a startup, SMON finds that this is not the case, the startup will not complete.
If a controlfile copy is damaged or missing, the startup will stop in NOMOUNT mode.
A message is written out to the alert log detailing which copy (or copies) of the
controlfile is damaged. Assuming that the controlfiles are fine, SMON proceeds to
open the database. During this phase, it checks the headers of all the online datafiles.
If any are missing or damaged, appropriate error messages are written out to the alert
log, and the database remains in mount mode. If all the online files are present and
not damaged, but one or more of them are not synchronized, SMON will attempt to
synchronize them by using the online redo logs. This is the process of instance recovery,
detailed in Chapter 18, and will happen automatically. If the online logs required are
not available, then the database cannot be opened. If one or more datafiles have been
restored from a backup, they will almost certainly be so far out-of-date that the online
redo logs will not go far enough back in time to recover them: this is when you must
use archive log files for the recovery, which is a procedure that must be initiated
manually—from SQL*Plus if you are backing up with operating system commands,
or with RMAN if (as Oracle strongly advises) you have committed to using RMAN for
your backups.
If the media damage occurs while the database is open, the effect will depend on
which files were affected. Damage to any controlfile copy will result in the instance
terminating immediately. Damage to a datafile that is part of the SYSTEM tablespace
or the active undo tablespace will have the same effect. But damage to an online log
will not terminate the instance, as long as there is a surviving member of the logfile
group. In fact, the instance will continue to function, and your end users will not even
notice. But error messages will be written out to the alert log, and the situation should
be corrected without delay; such corrections can and should be done online, while
people continue to work. Damage to a datafile that is part of a tablespace other than
SYSTEM or the active undo tablespace will also not result in an instance failure, but
clearly the end users may have problems, because a part of the database will be missing.
How your application will react to this is unpredictable—it will depend completely
on how the application is structured. The restore and recovery of damaged datafiles
can be done online, provided that they are not datafiles belonging to SYSTEM or
the undo tablespace. Finally, damage to the tempfiles that make up your temporary
tablespaces may not be noticed by the end users at all. Oracle does not validate the
existence of tempfiles until they are needed, and a well-tuned database may never
need them. This means that tempfiles can be missing for some time before there is
any noticeable effect. It also means that a damaged tempfile can be dropped and
re-created at any time, unless it happens to be in use at that moment.
As with backups, a restore can be done with RMAN or with operating system
utilities. But if your RMAN backups were to backup sets, rather than as image copies,
the restore can be done only with RMAN: there is no other way to extract datafiles
from a backup set. Recovery after a restore can be carried out with SQL*Plus commands
or with RMAN, but the same restriction applies: only RMAN can extract archive logs
from a backup set.
Recovery from media failure
Restore and recovery following media failure is covered in much greater detail in later
chapters and the second OCP examination, but it is necessary to know the rudiments
of recovery from simple problems for the first examination too. These simple problems
are loss of one copy of a multiplexed controlfile and an online redo log file, and
complete recovery following loss of critical and noncritical datafiles.
Recovery from loss of a multiplexed controlfile
As long as a surviving multiplexed copy of the controlfile exists, recovery from loss of
a controlfile is simple. Just replace it with a surviving copy of the controlfile. To restore
the damaged or missing controlfile copy from a backup would be useless in these
circumstances, because all copies of the controlfile must be identical; clearly, a restored
copy would not be synchronized with the surviving copies, nor with the rest of the
database.
Virtually the moment the damage occurs, the instance will terminate. As ever, the
DBA's first reaction to a crashed instance should be to attempt a startup. This will fail,
in NOMOUNT mode, with an appropriate error message. The alert log will state which
controlfile copy is missing, and also—in the section listing the nondefault initialization
parameters—how many controlfiles there actually are, and where they are. At this
point, you have three options. First, you could edit the parameter file to remove the
reference to the missing or damaged controlfile.
This is fine, but your database will now be running on one fewer multiplexed
copies, which will presumably be in breach of your security guidelines. A better option
is therefore to replace the damaged file with a copy made from a surviving copy or
indeed to change the CONTROL_FILES initialization parameter to replace the
reference to the damaged file with a reference to a brand new file, and copy the
surviving controlfile to that.
EXAM TIP Recovering from loss of a controlfile will entail downtime. It
cannot be done online.
Exercise 20-1: Recovering from loss of a controlfile
In this exercise, you will simulate the loss of a multiplexed controlfile and replace
it with a copy.
- Connect to your database with SQL*Plus, and ensure that your controlfile
is multiplexed with this query:
This query must return at least two rows. If it does not, multiplex your
controlfile.
- Simulate damage to a controlfile by crashing the database and renaming
one of your controlfiles. Note that on Windows you may have to stop the
Windows service before Windows will let you rename the file, and start it
again afterward.
- Issue a startup command. The startup will stop in nomount mode, with an
"ORA-00205: error in identifying controlfile, check alert log for more info"
error message.
- Copy your surviving controlfile to the name and location of the file you
renamed.
- Issue another startup command, which will be successful.
Sample questions
Loss of which of these files will cause an open database to crash? (Choose
three answers.)
- A multiplexed controlfile
- A multiplexed online logfile
- A multiplexed archive log file
- An active undo tablespace datafile
- An active temporary tablespace tempfile
- A datafile from the SYSAUX tablespace
- A datafile from the SYSTEM tablespace
- A datafile containing critical user data
A copy of a multiplexed controlfile is damaged. What should you do?
(Choose the best answer.)
- Replace it with a surviving copy.
- Restore it with RMAN.
- Restore it with operating system commands.
- Re-create it with the CREATE CONTROLFILE command.
Click here to read the answers and the rest of this chapter.
About the authors
Damir Bersinic, OCP, MCSE, MCSDBA, MCT, is an infrastructure consultant with Trecata Corporation, a system integration consultancy in Toronto, Canada. He has more than 20 years of industry experience and has authored a number of titles on Oracle, SQL Server, Active Directory and Windows. His credits include multiple exam preparation titles for Oracle and Microsoft certification programs.
John Watson, OCP DBA 8.0, 8.i, and 9i, Internet Application Server Administrator, and certified on Managing Oracle on Unix, is a senior consultant with BLP Management Consultants in South Africa. He has taught for Oracle University in South Africa for four years. John taught the DB10g New Features course, the DB10g Manageability and Performance seminars, validated hundreds of OCP 10g New Features exam questions and contributed to the course material for all the 10g courses. John is the author of numerous articles on technology.