Published: 03 Jun 2002
Tape and recovery are synonymous. But it's the automation that libraries add to the base technology that make tape a viable road forward for storage managers.
The comparative low cost for backup and recovery ensures tape's continued use to keep up with the explosion of data. New tape technologies need to maintain pace with the increasing time and cost of backup/recovery. Even though redundant and/or remote disk mirroring has become less expensive, and thus more popular, due to its higher speed, tape is still the best solution for many businesses.
Best practices for recovery take a data snapshot from disk and write it to removable tape cartridges. Then the cartridges are sent to a safe, off-site archive. Tape overcomes single points of disk failure like software corruption cascading to the mirror or disasters, including sabotage, affecting both locations. For example, several companies mirrored from one World Trade Center tower to the other before Sept. 11. Others mirrored from their WTC tower to buildings across the street, which were also destroyed.
When using tape, a library is the most efficient method for fast, reliable backup and restoration of large amounts of data. Library automation provides economies of scale that individual drives can't by consolidating data, lowering total cost of ownership (TCO), reducing human intervention and error, simplifying backup and recovery, providing unattended lights-out backup and allowing a scalable growth path. Libraries' reduced administrative costs have a big impact on TCO. That effect is amplified by using a tape library in conjunction with a storage area network (SAN). But all tape libraries are not created equal, and choosing the right one requires extensive analysis (see "How to choose a tape library").
Tape library configuration
Before you can calculate a tape library's TCO, you need to choose the right combination of tape formats (see "A guide to tape formats") - which determines the capacity and transfer rate of individual drives - and the number of tape slots, which determines the total capacity of your library. The range available to you is quite broad: from autoloaders costing a couple of thousand dollars, to enterprise libraries in the hundreds of thousands of dollars (see "Scoping out tape library vendors").
The first step in automation is an autoloader. An autoloader, by definition, has one drive that typically serves seven or eight cartridges. An autoloader is good for unattended backup for a week or more for smaller companies. The newest versions are rack-mountable in a 2U format and have two drives - in case one fails. A library, in contrast, has more than one drive. Libraries range in size from small footprints up to room-size models comprised of chained-together libraries serving an entire enterprise. Libraries usually have express cartridge access to load an individual tape or groups of tapes. Larger libraries have bar code readers to inventory and keep track of tapes, magazine-style express slots and a host of other features.
Library vendors keep increasing system densities to decrease footprint. Another trend in library design is the use of auxiliary memory on cartridges for faster operation and more intelligent diagnostics. Libraries typically have a life of five to seven years and the tape drive technology may be upgraded during this time. All major computing platform environments are supported.
Libraries allow two or more drives to access a large number of cartridges. The transfer rate for the library is essentially equal to the transfer rate for a single drive multiplied by the number of drives, minus a little overhead, which all adds up to transfer rates of over a terabyte of data per hour. New formats and migration roadmaps associated with tape formats like LTO, S-AIT and SDLT, along with faster robotics, are raising the bar on capacities and transfer rates. Libraries are denser, smaller and faster than anyone thought possible a few years ago.
When sizing automation to an application, consider the ratio of drives to cartridges. For capacity, fewer drives are used. For transfer speed, more drives are used. Make sure that you have the actual transfer rate (less overhead) needed to meet your business recovery time objective.
Use TCO analysis to evaluate libraries
Vendors traditionally sold libraries via relationships, strong product features and functions. Users looked at TCO as the cost of hardware, software, maintenance contracts, power use and personnel to install, operate, manage and maintain an IT system - the technology TCO. As technology costs continue to drop and personnel and business costs continue to rise, business issues have assumed more significance and can no longer be left out of the TCO assessment.
Libraries should be selected based on how well they support the business TCO. The business TCO addresses the potential dollar cost/benefit to the user and business, showing how well the proposed library improves availability, performance and recovery cost impacts. Business costs have the greatest impact on a company's recovery from an outage and must be part of the TCO evaluation (see http://www.evaluatorgroup.com for a free white paper detailing TCO calculation).
You won't want to send a library back once installed, so include an on-site service contract in your calculations. Make sure you get the service level you need. It's a good idea to have a spare tape drive on hand: Will vendors provide a spare drive? Media cost can be the largest expense in large systems: Is there a special media purchase price bundle with a new library? Tapes can be reused, but need to be retired when they have been used for specified passes, or begin to show a rise in soft errors reported by the backup software. Include that in your TCO.
Choosing a tape format
There are more choices today for tape drives than ever before. There is no perfect format for all of your applications - you should consider the best choices for work groups, midrange systems, enterprise systems and specialized applications on their own merits. Unfortunately, the interchange capability of different tape technologies is nonexistent or extremely limited. There is interchange between vendors of the same type of devices. For practical reasons and safety, make sure the interchange is demonstrated and certified.
Comparing the capacity of different formats can be tricky. Most vendors offer data compression for tape systems built into the hardware. Also, independent software vendors (ISVs) provide compression. A typical loss-less data compression rate to use for estimation is two to three times, although the algorithms vary and the compressibility of data varies widely. Generally, business data compresses well, scientific data compresses poorly and systems data varies. To compare different tape technologies, always use native, uncompressed specifications. But you should also consider how cartridge construction, recording technology, software support and other tape characteristics meet your needs.
Cartridges.Tape cartridges come in two flavors, single reel and dual reel. Your choice will affect both the capacity and load time characteristics of the system.
The single reel tape cartridge has all the tape on one reel - the take-up reel is inside the tape drive unit. That saves space and cost in the cartridge, providing better alignment of the head assembly. The drive is larger - since the take-up reel is there - but the cartridge has more capacity per cubic inch because there is no empty space in the cartridge.
On the other hand, dual reel has a faster load time, which is the time from when a cartridge is inserted until the tape is ready for read or write of data. Single reel takes about one minute. Dual reel midpoint load is faster and can load in four seconds. In midpoint load, the cartridge has the take-up reel included and is always half empty. Put simply, choose single reel for capacity and dual reel for performance.
Recording technology. There are many ways that data is actually written onto the tape. The different formats have different characteristics in density, speed and reliability. Linear parallel recording writes multiple tracks simultaneously. Linear serpentine writes data in one channel along the tape and reverse and writes in the other direction on another channel. The cost of writing in serpentine is less expensive than linear parallel due to not having to have an expensive, parallel tape head. A third technology - helical scan recording - is similar to that used in consumer VCRs. Data is actually recorded in stripes at an angle across the tape by the rotating head.
Software support. Tape manufacturers maintain a close relationship with backup/recovery ISVs to ensure compatibility with their new products. A new tape product or library is not ready for use in the market until the ISVs port compatibility with their software and the popular operating systems.
Footprint. Tapes drives follow the computer industry trend demanded for a smaller footprint to reduce the real estate required for the device. Small form factor is key in lower end systems. Tape drives for midrange have been mostly in the 5.25-inch format, while enterprise class drives are larger. The 3.5-inch format is becoming popular in new midrange models to increase the number of drives in a library or to fit in an internal system bay.
Interconnects. There are many ways to connect tape drives to the system platform or host library. SCSI was the popular standard, but native Fibre Channel (FC) connections are increasing in importance, due to the popularity of network-attached storage (NAS) and SANs. Enterprise systems still use ESCON and the newer, faster FICON.
Auxiliary memory. Newer generation tape cartridges have non-contact, non-volatile memory chips that can be read by an RF reader. Metadata of where and what is stored and key diagnostics on these chips can provide vital information prior to loading the tape, increasing the degree of automation libraries can perform. Unfortunately, there is no auxiliary memory standard, which will frustrate automation providers and users until vendors get together and do the right thing. The new generation of tape drives or "super drives" such as LTO, Super DLT, AIT3 and Super AIT will have a dramatic affect on library automation (see "How new generation tapes change libraries" below). The increased cartridge capacity and transfer rate performance of super drives will increase the density and reduce the size of libraries, which will need fewer slots and less media.. That will help maintain tape's 60-fold cost advantage over disk, while providing faster data access to enhance applications
On the downside, super drives are not designed for the high duty cycles of the enterprise market. Enterprise class drives like StorageTek's 9840/9940 and IBM's 3480/90 should be used where duty cycles exceed 50%.
Tape libraries are great in SANs
Currently, the killer application for SANs is backup/recovery. Today, most SANs are connected by FC. The FC option on a tape drive costs more than SCSI. FC tape drives are needed in high-end library applications. Until the FC option cost comes down, it's less expensive to use routers and bridges to support multiple SCSI drives in a library. However, it is better to use native support than convert. Also, care must be taken to address the FC topology. For example, special ports or switches are needed that support arbitrated loop.
When using storage resource management software, make sure it's compatible with your tape library and HBAs, switches, and operating systems.Keep in mind that testing should be down to the firmware level. Don't mix and match - use tested, validated configurations. Once you arrive at a stable configuration, your tape library can aid in:
Serverless backup. Most backup software ISVs allow the tape drive to be the initiator in the SAN. Backup is much faster through a host and doesn't degrade the network.
Disaster recovery. SANs can be bridged or linked over large distances by iFCP, FCIP and soon, iSCSI. Even if you have redundant RAID, snapshots to a tape library are required as there are too many single points of failure such as software corruption, viruses and sabotage. Using the right backup software, a SAN can also split or duplex the write to a remote location.
SAN management and virtualization. Used to reduce the complexity and proprietary nature of SAN solutions. Tape, disk and memory strengths are blended and used proportionately to reduce cost. Tape is represented as an image on disk and then tapes are completely filled using volume stacking. StorageTek and IBM are the leaders in this market. SAN virtualization will become important.
Tape striping or redundant array of inexpensive taped (RAIT). This approach works for high bandwidth applications such as telemetry and seismic data. Currently, RAIT applications don't share data well in a SAN, due to EOT and other specific differences. Specialized software will be an important part of your tape library selection for this application.