Most storage administrators don't have trouble getting data onto disk or tape. The real challenge is keeping that data safe in the face of daily operations. Whether you're dealing with a hard disk failure or chasing a misplaced tape, data loss is a fact of life.
But data loss isn't just an inconvenience. It can result in costly business interruptions, and the increasing weight of government regulations and consumer expectations can impose severe penalties for lost data. IT professionals must take steps to back up and protect corporate data. This chapter covers the areas of disk, tape and remote data backup, and explains how to create a successful data backup strategy.
Tape is the quintessential data backup medium. Tape technology is mature and relatively inexpensive (per gigabyte) but it is too slow to serve as a primary storage platform. The appeal of "cheap, plentiful and slow" storage has made tape a traditional complement to disk storage systems.
Tape storage is a removable media technology, so tape cartridges can easily be exchanged with any compatible drive mechanism. Cartridges are designed for specific tape drive architectures and are not interchangeable. The "tape" is simply a length of flexible plastic ribbon coated with magnetic media and wrapped around a set of spindles. The spindles are mounted inside a plastic cartridge enclosure that protects the tape media. Tape cartridges have a relatively short working life because the tape media actually contacts the tape drive's read/write heads. It is recommended that tapes be replaced after about 2,000 passes, although exact replacement recommendations vary by tape style.
A tape drive is the electromechanical device that reads and writes to the tape cartridge, and exchanges that data with the rest of the computer. Most tape drives use either helical scan or linear tape head technology to access the tape. Helical scan drives use a rotating head positioned at an angle, reading and writing data as diagonal stripes along the tape's width. Linear tape positions a stationary head that runs along the tape length.
Numerous tape formats leverage these two approaches, including advanced intelligent tape (AIT), digital data storage (DDS), digital linear tape (DLT), Linear Open-Tape (LTO) and Travan. Factors that go into choosing a tape drive include capacity need, performance speed, media cost and technological longevity. LTO-4 is perhaps the most popular technology, offering speed, capacity and drive-based encryption.
Backup software is a management tool that interfaces backup hardware (the tape drives and libraries) with corporate data servers, allowing administrators to decide when and where to back up selected files, folders, drives, servers or even entire data centers. Backup software also supports automation so backups can be performed and verified during off-hours without direct human intervention. EMC NetWorker and Symantec Veritas Backup Exec 11d are two well-known backup tools.
While hard disks are the primary storage medium for all types of computer systems, disks are increasingly being used for data backup tasks (secondary storage). This is partly due to the falling costs of high-volume storage devices such as SATA and SAS drives, but also because backup needs are changing. Many organizations work in a global 24/7 marketplace and cannot afford to go offline for nightly tape backups. Should trouble strike, an organization must restore its operations in hours -- not days. Disks offer the cost-effective speed and storage capacity to make disk-based backup effective < href= http://searchstorage.techtarget.com/originalContent/0,289142,sid5_gci1164054,00.html>[see Chapter one for more information about disk storage].
In some cases, disk and tape technologies are combined in a disk-to-disk-to-tape platform, dubbed D2D2T. Primary disk storage is first backed up to secondary disks -- lost data can be quickly restored from the backup disk. Tape is then added on as a form of long-term archival storage. A benefit of D2D2T is that tapes can be written from the secondary disk storage so the main storage system is not taken offline in the tape writing process. The resulting tapes can then be sent offsite to protect the primary and secondary disk storage systems against disaster.
Some companies with existing investments in tape libraries may have trouble justifying the shift to disk-based backup systems. One way to ease the transition anxiety from tape to disk is through a virtual tape library (VTL). A VTL is a disk storage system designed to mimic the behaviors of a tape library. By emulating a tape system, a VTL can utilize disk speed to accelerate backups and restorations while leveraging an organization's existing backup software, policies, infrastructure and in-house technical expertise. Select a VTL that will most closely match your current tape library system, capacity needs and backup software. For example, the Pathlight VX650 VTL from Quantum Corp. can store up to 4.2 TB while emulating Quantum's Scalar 24, Scalar 100 and Scalar i2000 tape libraries.
The proliferation of remote offices has caused a backup problem. Business data can be just as important on servers in the Boise, Idaho, sales office as in the Seattle headquarters. But since many remote offices do not staff IT personnel, they rely on non-IT workers to rotate backup tapes and ship them to a data center.
To address this problem, a growing number of organizations are eliminating tapes in favor of WAN-based backups that transfer crucial information to the data center across broadband WAN connections. One product intended for remote WAN backups is LiveVault Corp.'s InControl. Rather than creating physical tape backups and rotating them to an off-site storage facility, WAN is also being employed to transfer data directly to an off-site archive service, such as Iron Mountain Inc. [see the SearchStorage.com article on remote office backup].
Bandwidth is the main issue with any WAN-based backup scheme. Fast bandwidth is expensive, so with WAN backups the focus is on using techniques such as data deduplication (aka single-instance storage or commonality factoring) and conventional compression to optimize the use of available bandwidth. Backups can also be shortened by transferring only data that had changed since the last backup process, a technique known as "delta differencing." Another technique is to avoid complete backups over WAN and just transfer the most important business files between locations.
Some organizations are eliminating the difficulties of remote IT by consolidating remote IT into a single data center. Remote access then uses WAN links with application accelerating technologies, such as WAFS, to serve applications and files to remote offices just as if the data were local. WAFS usually involves appliances installed at both ends of the WAN link, which cache needed files to each remote office for quick access. Any changes to a file can then be saved back to the data center as time and bandwidth allow [see the SearchStorage.com article on WAFS].
Other backup concepts
Backups fall into three categories: full, incremental and differential. A full backup is a complete copy of all files. A full backup on a server with 528 GB of data will transfer all that data to the backup target (e.g., disk or tape). Full backups take the longest to make, but they are easiest and fastest to restore.
An incremental backup only tracks changes made since the last backup event. If you perform a full 200 GB backup on a server Monday, and 2 GB of new data are added on Tuesday, an incremental backup will only capture the new 2 GB. If another 1 GB changes on Wednesday, only the new 1 GB is captured. Once a full backup is performed, incremental backups can be very fast. However, you must restore a full backup first and then all the incremental backups in succession since that last full backup.
By comparison, a differential backup captures the total changes made since the last full backup. For example, if 3 GB changes on Monday, 2 GB on Tuesday and 7 GB on Wednesday, each day's differential backup will capture 3 GB, 5 GB and 12 GB respectively. Differential backups take longer than incremental backups, but are easier to restore. With a differential backup, only the full backup and last differential backup must be restored.
Mirroring and replication are essentially the same thing -- both create copies of data -- but there are subtle differences. Replication is an offline copy of the data that isn't necessarily intended for use. Mirroring creates a data copy that can be used directly. For example, data is frequently replicated to CD or DVD for long-term archival storage but data may be mirrored to disk for RAID.
Snapshot and continuous data protection (CDP) technologies are appearing in disk-based backup systems. Snapshots capture the state of a storage system at a given point in time, saving detailed reference information about available data and its location, similar to a detailed table of contents. When trouble strikes, data can be restored based on the latest snapshot. Snapshots can be taken as frequently as a storage administrator deems necessary.
CDP provides even more granular detail, recording each storage transaction to a journal in real-time. If data loss occurs, the storage system can be "wound back" to the last good transaction, which could be minutes, even seconds, ago [see the SearchStorage.com article on CDP].
Security is becoming more important for data backup operations. Company data often includes confidential or personally identifiable information that needs to be protected. When a tape is lost or a network is hacked, sensitive information may fall into the wrong hands. Backup systems are starting to use encryption when saving files to tape or archival storage. Encryption is typically provided through the backup software such as Backup Exec, through a dedicated encryption appliance such as the CryptoStor products from NeoScale Systems Inc., or at the tape drive using LTO-4 drives. Encrypted data cannot be read without the corresponding keys, so encrypted data cannot be misused if it's stolen.