D2D Backup: Disk's dual role

In part one of a three-part series on disk-based backup, we describe how SAN disk-as-disk and NAS disk-as-disk work, as well as the pros and cons of each configuration.

Your backup choices are much greater and more complex compared to the days when you simply had to choose a backup application and tape library. Disk has solved the reliability and performance issues that most storage managers have experienced with traditional backup systems, but disk-to-disk-to-tape (D2D2T) has complicated the backup and restore processes. Users must now choose from among three backup architectures and four types of disk-based backup targets. This three-part series on disk-to-disk (D2D) backup describes the pros and cons of each approach to help you clarify your choices.

The traditional backup architecture
Before explaining how the different D2D backup options work, it's important to understand the backup systems they'll work with and why you might want to augment those systems with disk. In a traditional backup architecture (see Traditional backup architecture, this page), software resides on the backup client (the server to be backed up) that allows the backup server to transfer that client's data to tape, disk or virtual tape. The data may be transferred across the network (LAN-based), from the client directly to tape/disk across the storage area network (SAN) (LAN-free) or directly from primary storage to secondary storage across the SAN (server-free). In each case, the data is converted into a different format that's understood by the backup software running the backup. This format could be tar, cpio, dump, NTBackup or a custom format understood only by that particular backup package.

The biggest advantage of the traditional backup architecture is that it's well understood and mature. The biggest disadvantage is how it uses tape. It's difficult for a traditional backup system to use the streaming nature of modern tape drives efficiently. To properly stream tape drives, some backup software products (like EMC Corp.'s Legato NetWorker and Veritas Software Corp.'s NetBackup) send multiple backup jobs simultaneously to the same tape drive, a technique called multiplexing or interleaving. This technique helps backups but has a negative impact on the restore of a single backup; the backup software has to read the entire tape and disregard the data it doesn't need. Other backup software products, such as IBM Corp.'s Tivoli Storage Manager, solve the streaming issue with disk staging where backups are first sent to disk before they're sent to tape.

With the advent of lower-priced ATA-based disk arrays, however, everyone can take advantage of disk staging or disk-based backups without switching from a traditional backup architecture. You simply augment your tape library with a combination of disk and tape.

Disk backup options
There are four ways to augment your traditional backup system with disk. The first two options are called disk-as-disk because they involve using disk drives behaving as disk drives--the disks aren't pretending to be tape. In a SAN disk-as-disk configuration (see SAN disk-as-disk, this page), a disk array is connected to one or more backup servers via a SAN, and a disk volume is assigned to each server. Each server then puts a filesystem on that volume, and backups are sent to that filesystem. In a network-attached storage (NAS) disk-as-disk architecture (see NAS disk-as-disk, this page), the disk resides behind a filer head that shares filesystems via NFS or CIFS, and backups are sent to those filesystems.

The last two options employ virtual tape libraries (VTLs), where disk systems are placed behind a server running software that allows the disk array to pretend to be one or more tape libraries. Standalone virtual tape library, ( on this page) shows standalone VTLs that sit next to a physical tape library and pretend to be another tape library. Once you back up to a standalone VTL, you must use the backup server to copy its backups to physical tape if you want to send them offsite.

An integrated VTL (see Integrated virtual tape library, this page) sits between a physical tape library and a backup server, where it pretends to be a physical library. The backup server backs up to the integrated VTL, which then copies the data to the physical tape portion of its library.

When backup software backs up to a disk-as-disk system, it knows it's a disk and typically creates a file within the filesystem. To distinguish these backups from those sent to a tape (or virtual tape) target, some people refer to these types of backups as filesystem-based backups.

D2D2T Backup
Disk-to-disk-to-tape (D2D2T) backups: D2D2T backups are first sent to disk and eventually copied or moved to tape.
Disk-as-disk: A disk-based backup target that behaves as disk and doesn't take on the characteristics of tape.
Network-attached storage (NAS) disk-as-disk: A disk-as-disk backup target accessed via NFS or CIFS.
Storage area network (SAN) disk-as-disk: A disk-as-disk backup target accessed via Fibre Channel or iSCSI.
Filesystem-based backups: Backups that are sent to a disk-as-disk backup target rather than a virtual tape library or physical tape.
Virtual tape library (VTL): A disk array and server running an application that makes the disk array look like a tape library to the backup software application.
Integrated VTL: A VTL that directly integrates with a tape component and that also manages the process of copying data from VTL disk to physical tape.
Standalone VTL: A VTL that stands by itself, like a regular tape library. It uses the backup software's tape-to-tape copy to migrate data from virtual tape to physical tape.
Advantages of disk-as-disk targets
The biggest advantage disk-as-disk targets have over most VTL targets is price. Most disk-as-disk systems are priced significantly less per gigabyte than VTL systems because you're paying for the value of the VTL software.

You can save even more money by redeploying an older, decommissioned array as a disk-as-disk target. Decommissioned arrays are often end-of-life units without service contracts, so these service contracts should be resumed if you're using the unit in a production system. Since service contracts on older equipment can be quite expensive, be sure to compare the cost of resuming the contract to that of a new system with a contract included. Another advantage of disk-as-disk backup targets is that most backup software companies don't currently charge to back up to them; unfortunately, this is changing.

The final advantage of disk-as-disk targets is their flexibility, which may come into play if you plan on moving away from a traditional backup architecture. This article addresses how to use disk to augment a traditional backup system. A subsequent article will concentrate on new types of backup systems, such as data-reduction backup and replication-based backup. A data-reduction backup system tries to eliminate redundant blocks of backed up data, thus reducing the amount of data sent across the network and stored on the secondary storage system. A replication-based backup system uses replication as the mechanism to move data to a secondary location where it's then backed up. If one of these two new architectures is possibly in your future, you might want to consider a disk-as-disk target now; one advantage of disk-as-disk targets is that they're exactly what data-reduction backup systems and replication-based backup systems need as a target. You can't replicate to tape, and data reduction backup systems are also designed to go to disk-as-disk.

Disadvantages of disk-as-disk targets
Backup software companies are starting to charge for backing up to a disk-as-disk target, a trend that's expected to continue. Vendors defend this move because they're providing additional functions to their backup software. The going price to use a disk array as a staging device before data is moved to tape is approximately $2,000/TB. To use a 200TB disk array as a disk-as-disk target could add $400,000 to your backup software tab.

A disadvantage of disk-as-disk backup devices is the nature of filesystems. Files are written, opened, changed and stored back to the same place. Often, the new version of the file doesn't fit in the same place where the old file was, so a portion of it gets written to the original location while another part is written somewhere else on the disk, resulting in fragmentation. The more files you add, delete and modify, the more fragmented the filesystem. The way a backup system uses the disk results in significant fragmentation over time, which degrades performance.

Another issue when using disk-as-disk backup targets is that some backup software products don't back up to filesystems as well as they back up to tapes. For example, backup software products know exactly what to do when a tape fills up, but they're not always sure what to do when a filesystem fills up. Many of the major backup products require users to point disk-as-disk backups to a single filesystem. When that filesystem fills up, all the backups fail--even if another filesystem has adequate capacity. There are also other limitations, like the inability of some backup products to scan in filesystem images. If you let a tape expire from your backup catalog, most backup products will allow you to scan that tape, figure out what's on it and then enter its contents in the backup catalog. Some products can't do that with filesystem-based images.

Storing backups offsite is another challenge with disk-as-disk backup targets. The normal procedure would be to copy the disk-based backups to a physical tape and then ship the tape offsite. The problem is that most people don't copy their disk-based backups to tape. Therefore, you need to learn how to copy disk-based backup data to tape and then learn how to automate the process. These two steps can range from extremely easy to extremely difficult, depending on the backup product you use, and may also require you to purchase additional software from your backup vendor. Whatever method you choose to get the data from disk to tape, remember that the data is now moving twice, where before it moved only once. This means you'll need to budget time for the data to make that second move.

One final disadvantage of disk-as-disk targets is the lack of compression. While there's currently one NAS disk-as-disk target that uses data-reduction techniques on backups stored on that device (Data Domain Inc.'s DD200), most disk-as-disk targets don't have built-in compression. This means you may need twice as much disk with a disk-as-disk target as you would with a VTL that supports compression. (It should be noted that in-band, software-based compression products typically come with a rather hefty performance penalty--as much as 50%. In its new DX100 VTL, Quantum Corp. claims to offer hardware-based compression that doesn't degrade performance.)

Tip: Don't forget about restores
  • If you only buy enough disks to hold a few nights' backups (i.e., disk caching), you'll speed up backups but won't speed up restores.
  • If you want to speed up backups and restores, you should buy enough disk to hold all onsite backups.
Tip: Disk-as-disk licensing is changing
  • Most backup software companies will begin charging to back up to a disk-as-disk target.
  • Ask your backup software vendor what its plans are.
Tip: SAN vs. NAS performance
  • SAN arrays will offer better backup system performance.
  • NAS filers will be easier to manage and maintain, but throughput will be limited to the filer head.
About the series: This article is the first in a three-part series that describes the various ways to use disk to protect data. The series explains the advantages and disadvantages of the four types of disk-based backup targets, including storage area network (SAN) disk-as-disk, network-attached storage (NAS) disk-as-disk, standalone virtual tape libraries (VTLs) and integrated VTLs. The first two articles cover how--and why--users are augmenting their traditional backup and recovery systems with disk, with the second article focusing on VTLs. The concluding article describes the new backup and recovery architectures that have been enabled by inexpensive disk-based targets, including replication-based backup and data-reduction backup.
SAN disk-as-disk targets
A SAN disk-as-disk target is simply a disk array connected to the SAN and attached to one or more backup servers. The backup server puts a filesystem on the array and writes to that filesystem. The advantage over a NAS disk-as-disk system is the better write performance typical of a high-end SAN disk array compared to an Ethernet NAS filer.

However, when you use a disk array as your backup target, you replicate into your secondary storage all of the provisioning issues of your primary storage. All of that hassle with associating disks to RAID groups, RAID groups to servers, and volumes to filesystems now needs to be done on the back end of your backup system. This problem is compounded when you have multiple backup servers. When using a tape library or VTL, most backup software packages know how to share these devices. If you're using a SAN disk-as-disk target with multiple backup servers, you'll have to decide how large each backup server's volume needs to be and allocate the appropriate amount of space to each backup server.

NAS disk-as-disk targets
A NAS disk-as-disk target solves the provisioning issues of a SAN disk-as-disk target by putting the disks behind a NAS head, making a giant volume and sharing that volume via NFS. Generally speaking, such systems are easier to maintain than traditional disk arrays. But that easier management comes with a price. Both the filer head and filer operating system increase the cost of the system. And performance will be limited to the throughput of the filer head. Depending on the size of your backups, however, performance may not be an issue. If you're a NAS shop with many other filers, a NAS disk-as-disk target makes perfect sense--especially if you're going to use replication-based backup.

Disk-as-disk targets provide a quick and inexpensive way to start backing up to disk. Yet they also have many disadvantages when used with a traditional backup system. If you're going to use a disk-as-disk system, you'll need to choose either a SAN or a NAS unit. A SAN device may be more powerful than a NAS unit, but the SAN device will be more difficult to maintain and share. In the next article in this series, we'll explore how to use VTLs with your backup system. We'll describe their advantages and disadvantages vs. disk-as-disk systems, and explain the advantages and disadvantages of the two different kinds of VTLs.

Dig Deeper on Disk-based backup