How different is the required skill set for an Oracle DBA who is transitioning from traditional individual database servers to implementing and supporting an Oracle RAC environment in a grid configuration? If you have applications that do not natively support transparent failover, is it worth the added complexity of implementing and running RAC?
The question is a good one and I asked Alex Gorbachev, CTO at Pythian, a global industry-leader in remote database administration services. As an Oracle ACE Director (an elite and rare group of professionals with less than 100 DBAs in the world holding this), I felt that Alex was much more qualified to answer this question. His answer follows:
Oracle Real Application Clusters (RAC) is a more complex technology and requires broader knowledge of IT infrastructure including storage and networking domains, operating system knowledge as well as solid understanding of clustering concepts. DBAs with broader IT knowledge tend to be more comfortable with Oracle RAC and Grid deployments. In any case, nothing beats experience with Oracle RAC and having an experienced mentor on the team is the safest bet to adopting Oracle RAC.
Transparent failover without application support is, unfortunately, a myth. Unless an application serves purely read-only traffic, it should be able handle transaction failures (either resubmitting a failed transaction or rejecting it with an error). While Oracle provides features aiding in the re-establishing of failed connections to an Oracle RAC database, Transparent Application Failover (TAF) and Fast Connection Failover (FCF), an application must be designed to survive session disconnect and this has nothing to do with Oracle RAC. If an application can recover from temporary database failures working with single instance databases, failover with Oracle RAC can usually be configured without any application changes. The reality is that many applications are not written that way.
We have customers with legacy applications and no way of updating the application code to handle database failures. In such cases, there is still a way forward and we use other Oracle database features, such as Fast Application Notifications (FAN) and server-side callouts, to automate application administrative actions that otherwise must be done manually. For example, we can deploy a special trigger that initiates the restart of the application mid-tiers one by one or instructs them to re-establish database connections upon a single node failure. Don't forget that by using TAF and FCF it's possible to shorten failure or brownout time to seconds, where legacy applications would generally take minutes to recover.
If Oracle RAC is deployed for high availability, the best results are achieved by appropriate application integration. However, Oracle RAC carries other potential benefits and many customers deploy it for scalability and consolidation purposes so handling node failures transparently is not the only factor when evaluating RAC deployments.
Have a question for Scott Rosenberg? Send an e-mail to email@example.com
This was first published in April 2010