Published: 05 Mar 2003
In the June issue, (see "Surprise! Cheap disks cure slow backup") I introduced the idea of using inexpensive ATA-based disk arrays as the primary storage medium for a backup and recovery system. Instead of backing up to a tape library, your initial backup goes directly to these new ATA disk arrays and is later sent to tape for off-site and long-term storage. This article further examines the options available in this new world of disk-based backup and details the pros and cons of using various products with ATA disk arrays.
There are a myriad of vendors offering inexpensive RAID arrays based on ATA disk drives and FC connectivity. These boxes usually offer hardware-based RAID supporting RAID 0+1, RAID 1+0 and RAID 5. What these boxes lack is a file system. How you use these RAID arrays in your backup system depends on your configuration and what you hope to achieve.
Suppose you want to back up everything across the LAN to a central backup server, and you simply want to use this inexpensive RAID array as the destination device for its backups. To do this, simply choose a RAID level, create your virtual disk drives using that RAID level, connect the array to your backup server, create a file system on the array and use this file system as the destination for your backups.
What if you had a master backup server, four media servers and a terabyte-sized disk array and you wanted each of these servers to be able to use the RAID array as the destination device for backups? There are three ways to do this. The first method is inexpensive and simple, but lacks some flexibility. Create five 200GB LUNs, attach the disk array and five servers to a storage area network (SAN), create a 200GB file system on each server using one of the five LUNs and then use this file system as the destination for backups on that server.
The disadvantage to the above method is when one server needs far less than 200GB, and another server needs more than 200GB. There are two ways to get around this problem. The least expensive way would be to attach the disk array to the back of a low-cost Windows or Linux box, and share the RAID array via Common Internet File System (CIFS) or network file system (NFS). The viability of such a solution will be based on the power and speed of your network, as well as the power and speed of the Linux or Windows server. Another option would be to connect the RAID array to a SAN and use a SAN-accessible file system, such as Sistina's Global File System or Veritas' SAN Filesystem. This option is obviously more expensive than the previous option, but it allows you to share the entire capacity of the array with each server on the SAN. Some vendors that are offering ATA-based, FC attachable disk arrays include Axus, Nexsan, and Zzyzx.
The next way that ATA disks are being used in backup is with virtual tape library appliances (VTLs). VTLs consist of a RAID array, usually custom-built for the appliance, and some type of firmware or software. Physically, a VTL doesn't look much different than a FC disk array. On the outside, you would find one or two FC interfaces, and if you were to open it up, you would find an array of ATA disk drives. What makes this different from a regular RAID array is the software (or firmware) running inside the VTL. It causes the RAID array to appear to be an actual tape library. For instance, the Quantum DX 30 appears to be a Quantum/ATL P1000 library with two or six DLT tape drives and 30 slots preinstalled with media. Other VTLs -- such as Hitachi's VTLA -- are able to emulate many different types of tape libraries.
As of this writing, there are only two such VTLs in existence: Quantum's DX 30 and Hitachi's VTLA. The DX 30 is a complete solution built entirely by Quantum. Hitachi's product, on the other hand, was created by a cooperation of three companies. Hitachi built the hardware, Nissho Electronics helped with planning, and Alacritus wrote Securitus, the software that makes the Hitachi disk array appear as a tape library.
The biggest advantage of VTLs is they can be seamlessly integrated into your enterprise without requiring any change in how you do backups. To the storage administrator, it looks like another tape library to back up to. You stop using one tape library and start using this one. You will, of course, need to duplicate your backups from the VTL to a real tape library if you wish to ship your backups off-site. If you have connectivity to your off-site storage vendor, you could duplicate your backups from one VTL to another.
VTL vendors also claimed a performance advantage over network-attached storage (NAS)-based devices, which are discussed in the next section. They say they're writing directly to the raw disk without the overhead of a file system. They might also be using a different level of RAID that's more suitable to sequentially streaming data. The DX 30, for example, uses RAID 3, which is typically used for streaming video.
|Disk-based backup information|
|Backup and recovery technology is changing rapidly. Some of the vendors that have disk-based backup and recovery products can be found at the Website for the enhanced backup initiative, http://www.enhancedbackup.com. Directories of such products can also be found at http://www.storagemountain.com.|
Like real tape libraries, VTLs can also be shared using dynamic drive sharing software such as Veritas' SSO or Legato's DDS. An interesting point: In order to use a VTL, you must buy a robotic license from your backup software vendor for a robot that doesn't truly exist.
Emulating tape drives also gives VTLs a disadvantage. Since the backup software doesn't realize this tape drive (a sequential access device) is really a disk drive (a random access device), it will treat it like a sequential access device. The first backup will be placed at the front of the tape. Subsequent backups will be placed after previous backups on the tape. When a backup product needs to restore a backup that was placed closer to the end of the tape, it will need to read the tape sequentially until it gets to the end of the tape. Another problem is multiplexing. If a storage administrator needs to multiplex multiple backups onto a single virtual tape drive in order to achieve the maximum throughput of that device, multiplex backups will be interleaved onto the tape just as they would to a tape drive. When a backup product needs to restore or copy one image from a multiplexed backup, it will need to read and disregard the other backups within that multiplex image, just as it would with a tape drive. As you'll see in the next section, NAS-based devices don't have this disadvantage.
Instead of purchasing an ATA-based RAID array and connecting it to your system via FC, you could purchase a NAS device based on ATA disk drives. Some NAS filers were always based on ATA disk drives, such as Quantum's products. Other vendors' filers are based on SCSI or FC disk drives. Some of these vendors have recently created ATA-based filers, and they are actively marketing them in the archiving and backup and recovery markets. Network Appliance's NearStore is an example of such a product.
Using a NAS appliance instead of a basic RAID array solves the sharing problem described earlier. All servers that can mount the NAS appliance can use it as a destination for backups. Since each server sharing the appliance will only be writing to a separate file within a large network file system, sharing isn't a problem. NAS vendors will also tell you advances in NFS, CIFS and NAS technology will allow your backup server to write to the NAS appliance with minimal latency and maximum bandwidth -- usually allowing you to fill the entire Gigabit Ethernet type with backup data.
One perceived disadvantage to NAS-based devices is that your backup software must support backing up to a file system device in order to back up to a NAS device. However, almost all backup software supports such functionality. Although integrating this device into your backup system wouldn't be as seamless as installing a VTL, it's still relatively simple. You mount the NAS device to your backup server, create a directory for each destination device you wish to create and then configure your backup software to use those directories as destination devices for backup. And, unlike VTLs, they don't require purchasing a robotic license or drive sharing software.
The advantage to NAS-based devices is your backup software knows they are random access devices. When writing the first backup to a file type device, it creates a new file within the directory you created. When making subsequent backups to the same device, it makes new files for each backup sent to the device. This is true even for multiplexing. If you send several simultaneous backups to a file type device, each backup would create a separate file on the file system. This means that when the backup product needs to read a particular backup - whether for a restore or for duplication - it knows exactly which file it wrote to, and will open only that file and read it. So restores happen instantaneously, instead of waiting while the tape reads unneeded data that it must disregard. Customers can also make the multiplexed backups from multiplexed originals with no performance impact.As you can see, VTL and NAS vendors are able to claim both advantages for their product and disadvantages for the other type of product. Only testing in your environment will prove which one is right for you.
Backup and recovery software
Also participating in this revolution are traditional backup and recovery software vendors. Most of them are confirming they can write to virtual tape libraries and file type devices, and are making sure they can automate copying to (real) tape the backups that were sent to these devices. At least one vendor, BakBone Software, has written their own virtual tape library software allowing standard disk to behave like a tape library. At this time, this software only works with NetVault, their backup software product.
There's also a new genre of backup and recovery software. These products were written from the ground up with the understanding that the primary backup medium would be disk, which allows them to use a number of new backup methodologies that wouldn't be possible with tape. These include concepts such as Single Instance Store (SIS), a technology that backs up a single copy of the each file, regardless of how many times that file resides in your enterprise. Imagine backing up only one copy of word.exe, or df. They also include block level incremental backup, where only the bytes of a file that have changed are backed up. Traditional backup software backs up the entire file when only one byte changes. Using both of these technologies together doesn't work well when tape is your primary backup medium, because they result in a single system being backed up to many tapes.
This relatively new industry includes products such as Connected's TLM, Veritas' NetBackup Professional, EVault, Nexsan's InfiniSAN D2D, and Avamar's DPN. You should expect to see many more products like this in the near future.
About the author:
W. Curtis Preston is the president of The Storage Group. He is the author of Unix Backup and Recovery and Using SANs and NAS.