Guido Vrola - Fotolia
High availability is a critical requirement for businesses around the world. No organization wants to face downtime caused by an unresponsive database or application. As database administrators, our top priority should be to reduce downtime by being proactive about managing databases and ensuring that they're available for use with Oracle high availability tools.
There are two types of downtime: planned and unplanned. The planned variety can't always be avoided. But unplanned downtime occurs when users are unable to connect to a database when it should be up. To help DBAs cut such incidents down to a minimum, Oracle Database offers a variety of Oracle high availability features.
Unplanned downtime can be caused by several different factors. Let's look one-by-one at the causes and the Oracle high availability tools that can be used to counteract them.
Data corruption issues
Data corruption is a relatively rare event, but one that can have a very negative impact on the business. Aside from user errors (covered in the next section), the most common data corruption issues are caused by problems with the database server's operating system, a network interface failure, or issues with software or the system hardware. These can result in applications writing corrupt data to the database -- and then getting that data back when executing reads. The need to fix the data errors can mean unplanned downtime for databases.
Oracle includes Data Guard, Recovery Manager (RMAN) and other tools with its databases to help DBAs recover data after it gets corrupted or another type of disaster occurs. For example, corrupted data blocks -- ones that aren't formatted in a recognized manner or lack internal consistency -- can be detected with tools such as Data Guard and Oracle Enterprise Manager; RMAN can then be used to remove the corrupted blocks and, if your environment is configured to use a real-time standby database, automatically restore them from a good backup.
In addition, Oracle Database provides protection against data corruption with the help of Data Recovery Advisor, a tool that automatically diagnoses problems with data and suggests ways to repair them. DBAs can use the options identified by Data Recovery Advisor to fix data corruption issues, tapping the tool to execute the fixes for them.
Errors caused by database users
One of the most common causes of data corruption is human error that unintentionally leads to unstable or incorrect data in a database. Some common errors by database users are accidental deletion of files from the Oracle Database File System, unauthorized data changes and inadvertent removal of database objects. One way these errors can be reduced is by restricting user access to data and database services. Using Oracle Enterprise Manager or SQL statements via the command-line interface, DBAs should grant users only the privileges they need to perform their job responsibilities effectively.
LogMiner is an Oracle high availability tool that can help in identifying data problems caused by users -- it lets DBAs query and analyze redo log files for purposes such as auditing changes to data and pinpointing where in a database architecture a particular case of data corruption occurred. Oracle Flashback Technology, a set of tools for repairing corrupted data while a database stays online, is another option. It provides a SQL interface that DBAs can use to analyze and fix user-generated errors at the row, transaction, table, tablespace and database levels.
Data storage failures
Common storage issues that can lead to unplanned downtime for Oracle databases include disk drive, storage array and disk controller failures. To guard against them, it's very important to have appropriate database backup and recovery plans in place.
In addition to Data Guard and Oracle Automatic Storage Management, the vendor's volume manager tool, you can use RMAN to back up data and restore it from existing backups. The RMAN utility lets you recover data up to the time just before a storage failure occurred.
Database server failures
Server-level failures can also lead to database downtime. Incidents of this sort can be caused by an accidental reboot of a database server, an Oracle instance failure or a system crash. You can overcome these failures using Oracle high availability features like Data Guard, Oracle Restart and Fast-Start Fault Recovery.
In addition to the capabilities mentioned above, Data Guard enables DBAs to maintain one or more standby copies of a production database on in the same data center or at a different location. In the event of the primary database becoming unavailable due to an outage, it can also be used to switch a standby database to the primary role.
Oracle Restart ensures that different database components are automatically restarted according to their configured dependencies after an unexpected component failure or the database host restarting. The Fast-Start Fault Recovery function speeds up database restarts to open up data access to applications without having to wait for data rollback processes to be completed.
Data center failures
Organizations should always prepare a disaster recovery plan to handle data center failures, which can lead to a complete outage of your database applications; without such a plan, the result could be long periods of unplanned downtime. To overcome such failures, one step you should take is using RMAN to periodically transfer database backups to an off-site location.
These backups will help you restore databases on another host system in the off-site facility if the original data center becomes unavailable, thereby helping with business continuity. It's also highly recommended to maintain a standby database in a Data Guard environment so you can put it into use if the production environment fails due to a data center outage.
Oracle Zero Data Loss Recovery Appliance protects from data-loss exposure
Not all vendors support RMAN backup and recovery
Oracle database availability in a nutshell