Performance tuning RMAN backup and recovery operations

If you have used RMAN before Oracle Database 10g, you will find some changes to the performance-related commands, which make the tuning process easier to understand. This chapter from Oracle Database 10g RMAN Backup & Recovery looks at what you need to tune before tuning RMAN itself and then provides some tuning options for RMAN.

This is an excerpt from Chapter 16 of Oracle Database 10g RMAN Backup & Recovery by Matthew Hart and Robert Freeman,...

copyright 2007 from Oracle Press, a division of McGraw-Hill. Click here to read the full chapter.

RMAN actually works pretty well right out of the box, and you generally will find that it requires very little tuning. However, there are a number of other pieces that fit into the RMAN architectural puzzle. When all those pieces come together, you sometimes find that you need to tweak a setting here or there to get the best performance you can out of your backup processes. Generally, then, the RMAN tuning you end up having to do usually involves dealing with inefficiencies in the logical or physical database design, tuning of the Media Management Library (MML), or tuning RMAN and the MML layer to coexist better with the physical device that you are backing up to.

If you have used RMAN before Oracle Database 10g, you will find some changes to the performance-related commands. In our estimation, these changes make the tuning process easier to understand overall. In this chapter, we look at what you need to tune before you begin to tune RMAN itself. We then provide some tuning options for RMAN.

Before you tune RMAN

If your RMAN backups take hours and hours to run, it's probably not RMAN's fault. More than likely, it's some issue with your database or with your MML. The last time you drove in rush hour traffic, did you think the slow movement was a problem with your car? Of course not. The problem was one of too many cars trying to move on a highway that lacked enough lanes. This is an example of a bandwidth problem, or a bottleneck. Cities attempt to solve their rush hour problem by expanding the highway system or perhaps by adding a subway, busses or light rail.

The same kind of problem exists when it comes to tuning RMAN and your backup and recovery process. It's often not the fault of RMAN, although RMAN often gets blamed. More than likely, the problem is insufficient bandwidth of the system as a whole, or some component in the infrastructure that is not configured correctly. RMAN often gets the initial blame, but in the end it is just a victim.

Once you have the architecture working correctly, much of RMAN tuning really turns out to be an exercise in tuning your Oracle database. The better your database performs, the better your RMAN backups will perform. Very large books have already been written on the subject of tuning your Oracle database, so we will just give a quick look at these issues. If you need more detailed information on Oracle database performance tuning, we suggest another title from Oracle Press: Oracle Wait Interface: A Practical Guide to Performance Tuning & Diagnostics by Richmond Shee, Kirtikumar Deshpande and K. Gopalakrishnan (2004).

NOTE: We make some tuning recommendations in this chapter and in other places in this book. Make sure you test our recommendations on your system before you decide to "fire and forget" (meaning to make a change without checking that the change was positive). While certain configurations may work for us in our environments, you may find that they do not work as well for you.

RMAN performance: What can be achieved?

So, what is the level of RMAN performance that can be achieved with the currently available technology? Oracle Corp., in its white paper "Oracle Recovery Manager: Performance Testing at Sun Customer Benchmark Center" (October 2001), available at OTN, found that a backup or recovery rate of 1 terabyte (TB) per hour to tape was possible. As tape backup technology continues to improve, rates exceeding this will likely be possible.

Have the right hardware in place

If you want high backup performance, then the first thing to look at is the backup hardware at your disposal. This consists of items such as tape drives, the associated infrastructure such as cabling, robotic tape interfaces, and any MML-layer software that you might choose to employ.

Backup media hardware will provide you with a given speed at which the device will read and write. Of course, the faster the device writes, the faster your backups. Also, the more devices you can back up to, the better your backup timing tests will be. This was clearly pointed out in Oracle's RMAN performance white paper mentioned in the preceding section. The doubling of the number of drives that RMAN could write to causes an almost linear improvement in performance of both backup and restore operations. The ability to parallelize your backups across multiple channels (or backup devices) is critical to quickly backing up a large Oracle database.

RMAN will benefit from parallel CPU resources, but the return diminishes much quicker with the addition of CPUs, as opposed to the addition of physical backup devices. The bottom line, then, is that having multiple backup devices will have a much greater positive impact on your backup and restore windows than adding CPUs will, in most cases.

You will find that most backup devices are asynchronous rather than synchronous. An asynchronous device allows the backup server processes to issue I/O instructions without requiring the backup server processes to wait for the I/O to complete. An asynchronous operation, for example, allows the server process to issue a tape write instruction and, while that instruction is being performed, proceed to fill memory buffers in preparation for the next write operation. A synchronous device, on the other hand, would have to wait for the backup operation to complete before it could perform any other work. Thus, in our example, the synchronous process will have to wait for the tape I/O to complete before it can start filling memory buffers for the next operation. Thus, an asynchronous device is more efficient than a synchronous one.

Because asynchronous operations are preferred, you may want to know about a few of their parameters. First, the parameter BACKUP_TAPE_IO_SLAVES (which defaults to FALSE) will cause all tape I/O to be asynchronous in nature. We suggest you set this parameter to TRUE to enable asynchronous I/O to your tape devices (if that setting is supported). Once this parameter is established, you can define the size of the memory buffers that are used by using the parms parameter of the allocate channel command or configure channel command.

The tape buffer size is established when the channel is configured. The default value is OS specific, but is generally 64KB. You can configure this value to be higher or lower by using the allocate channel command. For the best performance, we suggest that you configure this value to 256KB or higher, as shown here:

allocate channel c1 device type
sbt parms="blksize=262144, ENV=(NB_ORA_CLASS=RMAN_db01)"

If you are backing up to disk, then you need to determine whether or not your OS supports asynchronous I/O (most do these days). If it does, then Oracle automatically uses that feature. If it does not, then Oracle provides the parameter DBWR_IO_SLAVES, which, when set to a nonzero value, causes Oracle to simulate asynchronous I/O to disks by starting multiple DBWR processes.

When either DBWR_IO_SLAVES or BACKUP_TAPE_IO_SLAVES is configured, you may also want to create a large pool. This will help eliminate shared pool contention and memory allocation error issues that can accompany shared pool use when BACKUP_TAPE_IO_SLAVES is enabled. If you are using Automatic Shared Memory Management (ASMM) in 10g, Oracle will manage the memory allocation of the shared pool for you. If you want to manually set the large pool, the total size of disk buffers is limited to 16MB per channel. The formula for setting the LARGE_POOL_SIZE parameter for backup is as follows:

LARGE_POOL_SIZE = (number of allocated channels) * 
   (16MB + (size of tape buffer))

NOTE: If DBWR_IO_SLAVES or BACKUP_TAPE_IO_SLAVES is not configured, then RMAN will not use the large pool. Generally, you do not need to configure these parameter settings to get good performance from RMAN unless your OS does not natively support asynchronous I/O.

Tune the database

A badly tuned database can have a significant negative impact on your backup times. Certain database tuning issues can also have significant impact on your restore times. In this section, we briefly look at what some of these tuning issues are, including I/O tuning, memory tuning and SQL tuning.

Click here to read the rest of this chapter.

About the authors

Matthew Hart is the co-author of three books from Oracle Press. He has worked with high availability technologies in Oracle since version 7.3, and has worked with RMAN since its inception. He has spent considerable time perfecting backup and recovery strategies for Oracle customers. Matthew currently works and lives in Kansas City, Missouri.

Oracle DBA Robert Freeman has also authored the best-selling titles Oracle Database 10g New Features, Oracle9i RMAN Backup & Recovery and Oracle9i New Features, as well as the Oracle Press book Portable DBA: Oracle. In his spare time Robert flies airplanes and works out at the local ATA Karate Center.

Dig Deeper on Oracle database backup and recovery