Home > Oracle Database / Applications Tips > Chapter Downloads > Oracle RAC design: The effects of component failure
Oracle Tips:
EMAIL THIS
 TIPS & NEWSLETTERS TOPICS 

CHAPTER DOWNLOADS

Oracle RAC design: The effects of component failure


Mike Ault and Madhu Tumma
08.21.2006
Rating: -4.17- (out of 5)


Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us   


The proper design of a Real Application Clusters (RAC) environment is critical for all Oracle professionals. This book excerpt shows expert tips and techniques used by real-world RAC professionals for designing a robust and scalable RAC system.

This is an excerpt from the top RAC book Oracle 10g Grid & Real Application Clusters, by Mike Ault and Madhu Tumma. Click here to download the full chapter.

Introduction to grid design

This chapter focuses on the issues that must be considered when designing for RAC. The reasons for utilizing RAC must be well understood before a proper implementation can be achieved.

Essentially, there are only a couple of reasons to use RAC. RAC spreads the load across multiple servers, provides high availability and allows larger SGA sizes than can be accommodated by a single Oracle10g instance, on Windows2000 or Linux implementations, for example.

The most stringent design requirements come from the implementation of high availability. A high-availability (HA) RAC design must have no single point of failure, a transparent application failover and reliability, even in the face of disaster at the primary site. A HA design requires attention to equipment, software and the network. This three-tier approach can be quite daunting to design.

What are the effects of component failure?

This section will provide a quick look at the effects of component failure.

Failure of the Internet or intranet

While not a component that a DBA usually has control over, failure of the Internet connection, usually due to the provider having a failure, means no one outside the company can access the application. Failure of the intranet or internal networks means no one inside the company can access the application. These components, usually comprised of multiple components, should also have built in redundancy.

Failure of the firewall

The firewall acts as the gatekeeper between the company's assets and the res


Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us   


RELATED CONTENT
Chapter Downloads
Writing single-row and multiple-row subqueries
List the types of SQL subqueries
Using subqueries in SQL
Define SQL subqueries
Oracle 11g: PL/SQL Basics
Oracle 11g: Backup and recovery concepts
Migrating to Oracle: Expert Secrets to Migrate from SQL Server and MySQL
Oracle Database 11g SQL Tuning
Upgrading to Oracle Database 11g
Tuning the Oracle database with initialization parameters

Oracle RAC and database clustering
E-discovery firm swaps out Microsoft SQL for Oracle RAC
Firm dumps MySQL on Red Hat for Oracle Database on Oracle Linux
How to back up archive log files in RAC
eHarmony spurns Microsoft, finds match with Oracle 10g
How to back up RAC database with RMAN
Using connection load balancing with Oracle RAC
20GB data dictionary causing performance problems
Grid computing adoption slow amid fears of complexity
DBA 102: Beyond the basics
Can I install Oracle binary files on a SAN?

Oracle database design and architecture
Can I download DBCA for Oracle Express Edition?
How to recreate an Oracle index in a new schema with the CREATE command
Using Oracle Universal Installer to install Oracle with Pro*C
Defining Oracle database repository vs. information repository
Can I create multiple schemas in Oracle for one user?
ORA-12514 error when connecting to the Oracle database through Toad
Solving the ORA-00904 error: invalid identifier in Oracle
How to tune SQL UPDATE statements for an Oracle 10g upgrade
Will queries run slower in a smaller Oracle buffer cache?
Using a database link to connect two Oracle apps instances

RELATED GLOSSARY TERMS
Terms from Whatis.com − the technology online dictionary
Real Application Cluster  (SearchOracle.com)

RELATED RESOURCES
2020software.com, trial software downloads for accounting software, ERP software, CRM software and business software systems
Search Bitpipe.com for the latest white papers and business webcasts
Whatis.com, the online computer dictionary


t of the world. If the database is strictly internal with no connection to the Web, a firewall is not needed. If there is only one firewall, a failure will prevent anyone outside the firewall, such as the rest of the universe, from contacting the database. Internal users, those inside the firewall, may still have access and some limited processing can occur.

Failure of the application server

The application server usually serves the Web pages, reports, forms or other interfaces to the users of the system. If there is only a single application server and it goes down, even if the database is fully functional, there is no application to run against it. A failed application server without redundancy means no one can use the database, even if all other components are still functional. This also applies to single Web cache servers or OC4J servers.

Failure of the database server

The failure of the database server is the one failure that is taken care of in a normal RAC configuration. Failure of a single database server leads to failover of the connections to the surviving node. While not a critical failure that will result in loss of the system, a single server failure means a reduction in performance and capacity. Of course, a catastrophic failure of both servers will result in total loss of service.

The servers will have disk controllers or interfaces that connect through the switches to the SAN arrays. These controllers or interfaces should also be made redundant and have multiple channels per controller or interface. In addition, multiple network interface cards (NICs) should also be redundant, with at least a single spare to take the place of either the network connection card or the cluster interconnect should a failure occur.

Failure of the fabric switch

The fabric switch allows multiple hosts to access the SAN array. Failure of the fabric switch or other interconnect equipment can result in loss of performance or total loss of the application. If the SAN cannot be accessed, the database will crash and no one will be able to access it, even if all other equipment is functional.

SAN failure

SAN failure can come in many forms. Catastrophic failure will, of course, result in total loss of the database. Failure of a single drive, if there is no hot spare or if the hot spare has been utilized, will result in severe performance degradation, by as much as 400% to 1000%, in a RAID5 situation where the data on the failed drive has to be rebuilt on the fly from parity information stored on the other drives in the RAID5 set. Even if there is an available hot spare, it still takes time to rebuild this hot spare from the parity data on the other drives. During this rebuild, performance will suffer.

Usually, SANs are configured with disk trays or bricks of a specific number of drives. This is usually comprised of eight active and a single spare in each tray or brick. A single tray becomes an array, in the case of a RAID0+1 setup, the array will be striped across the eight drives and would be mirrored to another tray in the array. Failure of a RAID0+1 drive has little effect on performance, as its mirror drive takes over while the hot spare is rebuilt on an "on available" basis. In a RAID5 array, the eight drives are usually set up in a 7+1 configuration, meaning seven drives in a stripe set and one parity drive.

When a drive fails, there must be an immediate spare available to replace it, even if the hot spare is available. If the hot spare has already activated and a second drive is lost, the entire array is in jeopardy. Most of these arrays use hot pluggable drives, meaning they can, in time of failure, be replaced with the system running.

NICs and HBAs

Every component requires a connection to the others. This connection is usually via a network interface card (NIC) or host bus adapter (HBA) interface. These NIC or HBA interfaces should be the fastest possible, especially in the case of the cluster interconnect and disk connect. Failed NIC interfaces result in the loss of that component, unless a second NIC card is immediately failed over to. A failure of the HBA results in loss of connection to the disk array. At a minimum, a spare NIC and HBA for each and every component must be available. Wherever possible, use interchangeable NIC and HBA interfaces.

Provide redundancy at each level

It is easy to see that redundancy at the hardware level is vital. At each level of the hardware layout an alternate access path must be available. Duplicating all equipment and configuring the automatic failover capabilities of the hardware reduce the chances of failure to virtually nil. It is also critical to have spares on hand for non-redundant equipment such as NIC and HBA cards and interface cables.

By providing the required levels of redundancy, the system becomes highly available. Once there is an HA configuration, it is up to the manager to plan any software or application upgrades to further reduce application downtime. In Oracle Database 10g using grid control, rolling upgrades are supported, further increasing reliability. At the SAN level, appropriate duplication software such as Veritas must be used to ensure the SAN arrays are kept synchronous. Oracle Database 10g allows for use of the Oracle Automatic Storage Management or ASM. ASM allows for automated striping, backup and database flashback capability.

Click here to download the full chapter.

Rate this Tip
To rate tips, you must be a member of SearchOracle.com.
Register now to start rating these tips. Log in if you are already a member.




DISCLAIMER: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.



Oracle Development Solutions - SQL, J2EE, XML, SOA
HomeNewsTopicsTipsAsk the ExpertsMultimediaWhite PapersProductsBlogs
About Us  |  Contact Us  |  For Advertisers  |  For Business Partners  |  Site Index  |  RSS
SEARCH 
TechTarget provides technology professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective purchase decisions and managing their organizations' technology projects - with its network of technology-specific websites, events and online magazines.

TechTarget Corporate Web Site  |  Media Kits  |  Site Map




All Rights Reserved, Copyright 2003 - 2009, TechTarget | Read our Privacy Policy
  TechTarget - The IT Media ROI Experts