If you're using tape as your backup medium, you're likely to need to fine-tune your system to get the best performance on backups. Here are some tips on speeding up your tape backup system.
What tape wants
Because tape, unlike disk, is a sequential medium, with blocks of information recorded or read in order, tape wants an even flow of data at the speed the tape is running. While buffers on the system, the tape drive and even in the storage area network's host bus adapters can help to even out temporary mismatches, they will fill up, (or empty, if the data stream isn't keeping up with the drives) if there is a chronic mismatch.
In the worst case, the tape drive will have to stop and back the tape up to let the rest of the system catch up. Shoeshining like this not only slows down backups, it causes increased wear on the tape and the drive.
While you can slow down your tape drive to match the speed of your system, it makes more sense to go through the system and see if and where you can speed up the data transfer.
Make sure balance of system matches your tape speed
The rest of your tape storage system needs to feed data to the tape drive at the proper speed, generally the fastest speed that the tape drive can handle.
If a mismatch only shows up occasionally you can often fix the problem by increasing the number of buffers on the server or the disk system. You can determine the optimal number of buffers by trial and error.
Check for backup bottlenecks
You can get the actual speed of the backup by using the time (Linux, Sun OS, etc.) or timeit (Windows) utilities to see how long it takes the system to handle a large file. Dividing the size of the file by the time it takes to process will give you your average speed. By comparing this against the drive documentation, you can see how much of a performance hit you are taking.
One common problem is that the tape drive settings, such as block size, are not optimized for modern technology. Vendors generally set the default settings for the lowest common denominator to make sure the drive works out of the box. However, these settings are seldom optimal. If you're using a new LTO-4 drive (120 MBps uncompressed), the settings on your server, network and other components may very well not be set to match the performance of the system. These settings can be found on the respective components (i.e., the server, network cards, software). Check your drive's documentation for optimum settings and re-set as appropriate.
Set the block size properly
Because the tape is reading information in sequential blocks, and because there is a certain amount of overhead associated with reading a block -- regardless of size -- it usually pays to set the block size larger than it is set for disk drives.
Disk drives get their best performance when their block size is balanced against the nature of the data being read or written. On most systems, this is a fairly small block -- perhaps 16K. Tape is simply trying to transfer as much data as possible rather than transfer small chunks of data.
This is something that needs to be determined by trial and error. Experiment with various block sizes on a large (2 GB) data file and time the transfers to tape. You may find that your system is currently set to feed data to the tape drive in blocks as small as 1K when your best speed requires 2048K blocks. In that case, the improvement in backup times will be substantial.
Compression increases throughput and decreases the backup time. How much depends on the nature and mix of the files being downloaded, but generally you can expect compression of 1.5:1 to 2:1, with a corresponding speed up in backups.
Note, however, that the compression ratio is not a hard-and-fast number. It varies greatly (although it won't exceed the vendor's published number) and has to be determined by trial and error. You can examine your results and see how much compression you're getting. But keep in mind that the amount of compression isn't fixed. It can vary somewhat from backup to backup depending on the mix of files.
Use data deduplication
Data deduplication strips redundant data out of the stream before it gets to the tape drive. Since the majority of the data in the typical backup is redundant -- 90% or more according to vendors -- data deduplication can save an enormous amount of time and disk space. Note that the time savings will not be as great as the dedupe percentage. If you get 90% reduction in size of your files you won't get a 90% reduction in backup time because it takes time to examine the data and remove the redundant blocks. However, the savings on the backup window will be substantial. Deduplication can be done anywhere in the backup chain from the server to the tape.
Even if your backup doesn't fill your backup window, you may still need to pay attention to your tape drive. If the drive is shoeshining, you're wearing your tapes and your drive at a greatly accelerated rate. A little time spent checking your parameters can pay big dividends.
About this author: Rick Cook specializes in writing about issues related to data storage and data storage management.
This was first published in August 2009