Chapter 8, High Availability: RAC,ASM, and Data Guard
This chapter from Oracle Database 11g: A Beginner's Guide explains how to work with high availability features in Oracle 11g such as Real Application Clusters (RAC), Automatic Storage Management (ASM), and Data Guard. In this section, learn how to define high availability and what is needed to build a high availability environment.
Define high availability
Define, Install and Test Real Application Clusters (RAC)
ASM Instance and ASM Disk Groups
How to manage an ASM instance with ASMCMD and ASMLIB
Implementing Oracle 11g data guard and data guard protection modes
Creating an Oracle physical standby server
High availability and reducing planned (or even unplanned) downtime is a goal of database systems, especially in environments that require accessibility 24x7. It is unacceptable to have databases go down for maintenance or even for hardware failures, since these outages can cause significant losses to the business. Luckily, Oracle 11g is here to save the day with high availability features such as Real Application Clusters (RAC), Automatic Storage Management (ASM), and Data Guard. In architecting database environments, the combinations of RAC and Data Guard will provide instance failover and even disaster recovery to an offsite standby server. In planning the configurations and combinations, you must look at cost-effective ways to provide the business with the availability that they require. Examining the features and how to implement them will assist you in providing a plan for a reliable, scalable, and stable environment that can handle the loss of a piece of hardware or be recoverable in the event of an unplanned circumstance. Let's also not forget the need for maintenance of the databases in Oracle 11g. With the rolling patches working by patching one node of a cluster and then continuing to the next node so at least one node is available while patching, even the planned maintenance window now becomes smaller.
CRITICAL SKILL 8.1
Define High Availability
What does high availability mean to the business? What is the level of risk tolerance? How much data loss is acceptable? Are there current issues with backups or reporting? These are all questions that need to be asked to start mapping out the components that are needed. You may decide that absolutely no data loss can be tolerated or, alternatively, that it is fine if the application is down for a day or two.
Also, it helps to look at what kind of outages can happen and then build in fault tolerance for these situations. Examples of unplanned outages are hardware failures, such as disk or server failures; human error, such as dropping a data file or making a bad change; and network and site failures. Then, add on to these examples the planned outages needed for applying patches, database changes and migrations, and application changes that might include table and database object changes and upgrades. Look for the areas in the system with single points of failure and then match up solutions to start to eliminate those areas.
This chapter just touches on a couple of areas necessary for building a highly available environment: Real Application Clusters, Automatic Storage Management, and Data Guard. Understanding these components, plus researching other Oracle options such as Flashback Query, Transaction and Database, Flash Recovery Area, Data Recovery Advisor, and Secure Backups, will assist in synching up the environment with the business needs in the area of availability.
So, in looking at the application and the business needs, if there are planned outages for maintenance to allow for downtime to patch the environment, rolling patches might not be as much of a concern. Instead, a solution for testing application changes as well as the patches might be possible via Flashback technologies or the ability to test application changes on a production-like server. If the business doesn't allow for downtime or a regular maintenance window, and you know each minute down will cost the company a serious amount of money, you can use a combination of components for the solution: rolling patches, prevention of outages from hardware failures, having failover servers through clusters, and Data Guard.
Working with the business teams and having some understanding of different options available for architecting a solution that meets budget restrictions and business needs will take some discussions and planning. The rest of this chapter will give you an understanding of some of these areas and what it takes to implement them.