Faster tape drives can speed a conventional backup, but disks are filling a wide range of data backup roles, locally as well as across a wide area network (WAN). Another wrinkle to finding the best backup strategy is that an organization must maintain security and adhere to government regulations.
Now that you're familiar with the basics of backup and data protection, this guide focuses on the advanced features and latest developments in tape, disk, and remote and strategic storage concepts.
On the surface, it's easy to select tapes --- just buy cartridges that fit the tape drives in your enterprise. But the choice is more complicated than that. Tapes are expensive. High-end LTO-3 tapes can cost more than $40 for a 400- to 800-GB cartridge; a Quantum digital linear tape (DLT) 0.8- to 1.6-TB tape can start at $100. Backup plans can also be affected by product shortages. For example, in late 2003, Imation Corp. became the sole distributor for the VXA and Mammoth tapes from Exabyte Corp. A limited manufacturing and distribution base can restrict stock and inflate tape costs.
Today, groups within an organization are sharing fewer (but larger) tape libraries, so there's a push to pay only for the library capacity being used, while maintaining management control over the tapes in that portion of the library. Consequently, tape libraries such as the Scalar i500 from Quantum Corp. are including partitioning and chargeback, scaling from one to 18 LTO-3 or LTO-4 drives, and supporting 36 to 402 tape slots in a single frame. Other tape libraries offering partitioning include IBM's TS3310, Qualstar Corp.'sTLS series and Sony Electronics Inc.'s CSM200. Note: Tape library features such as multiple tape media support, high availability and large numbers of tape drives are not as popular as once thought, but media management features like barcoding and RFID have become essential for tracking and retrieving media.
Backup software is moving beyond the role of scheduling and reporting. Users are turning to backup software to reduce the sheer size of their data backup. Compression, once used to fit more data onto a given tape, is being pushed aside by data deduplication, also called intelligent compression or commonality factoring. Deduplication works by saving a single iteration of a file or block, providing only pointers to duplicated data. In other words, instead of saving 10 copies of a 10 MB sales presentation, only one copy is actually saved to tape. [See our special report on data deduplication.]
Low cost and relatively high performance have made hard disks the preferred avenue for many data backup tasks. Regardless of the actual storage platform, there are several important trends. Data deduplication is appearing on data backup platforms such as virtual tape libraries (VTL) and content addressed storage (CAS) archives. Deduplication reduces the number of disks needed for storage or fits more data into available space -- by factors approaching 50-to-1 -- substantially lowering the disk investment.
Users must also consider the effects of power, cooling and reliability in large data backup disk installations. Arrays with hundreds of disks can consume thousands of watts of power, which is difficult to cool properly, and cumulative disk vibration can cause premature disk failures -- especially among SATA drives. Hard drive capacities are always increasing, while additional power conservation features are reducing their individual energy demands. Array manufacturers such as Copan Systems are developing MAID systems where 80% of the disks are idle. The idle disks are powered on and tested periodically. Data is migrated between disks to ensure that all disks are used for the same time -- reducing power and improving mean time between failures (MTBF).
The ongoing challenge with remote data backups is the cost of bandwidth. A company must budget for connectivity that supports an appropriate backup volume within an acceptable backup window. Too much bandwidth wastes money; too little bandwidth wastes time. Other technologies like compression, data deduplication and selective or differential backups reduce the amount of data while lowering bandwidth needs.
The choice between synchronous and asynchronous replication can have a huge impact on data backups. Synchronous replication offers the lowest recovery point objective (RPO) and recovery time objective (RTO), but the latency of long geographic distances can render this impractical. Asynchronous replication is easier, can work across longer distances and is tolerant of WAN outages. But asynchronous RPOs can range into hours because remote writes can lag significantly behind local writes.
WAN reliability is a consideration that is often overlooked. A failed WAN link can disable the data backup process, leaving critical data at risk. Organizations should investigate an alternative that can protect data during a WAN interruption. For example, users may implement a backup to local disk and then pass the disk backup to an off-site VTL or other disk system. If the WAN fails, there is already a local backup, and the remote backup can be retried or completed once the WAN is available.
Today's large and frequent remote office backups are speeded by an variety of WAN acceleration technologies that include larger TCP/IP packet sizes and handshake simplifications to reduce the latency of "chatty" tasks. [See our WAN All-In-One Research Guide.]
Other backup concepts
Traditionally, data backups were implemented to suit the needs of the organization, ensuring that important data could be recovered in an emergency. Mirroring, replication, snapshots and continuous data protection (CDP) technologies are still commonly employed for that purpose, and many disk storage platforms include applications to support these features. For example, EMC Corp.'s Clariion CX3 Model 80 includes SnapView software for local replication and MirrorView software for remote replication.
Today's data backups are increasingly influenced by compliance and corporate governance issues that require data to be integral, accessible and retainable for a prescribed length of time. Backup administrators need to know which data should be backed up; how the data should be backed up and protected; and how the data is accessed in the face of legal discovery or disaster. Backup planning for compliance should involve business units across the enterprise, not just IT. CAS systems are often used to meet compliance obligations since CAS platforms offer data deduplication, security and data management/search tools that are suited to long-term data retention and retrieval.
Security is an increasingly important issue, and backup administrators must protect sensitive or personal information against loss. Symantec Corp.'s NetBackup software offers encryption as an option, allowing tape or disk data to be encrypted during data backup, or decrypted for recovery. Still, there is debate about "where" encryption should take place. Software encryption is effective, but it reduces performance and locks an organization into the backup software product.
Encryption can also be performed at the tape drive itself (as in Sun's T10000 or any LTO-4 compliant tape drive), or through a dedicated appliance, such as the CryptoStor family from NeoScale Systems Inc., or the DataFort family from Network Appliance Inc. In the disk-to-disk (D2D) storage realm, VTL products are embracing encryption, and FalconStor Software Inc. includes a Secure Tape Transport Service module with its VTLs. [See our Storage Security Buying Guide.]