Data deduplication and differences among target-based dedupe systems
The biggest game-changing feature has been data deduplication. It changes a VTL from a disk staging device with only a few days of backups (due to the cost of disk) to a device that can affordably hold all onsite backups. And dedupe built the IDT market; without dedupe, an intelligent disk target is truly just a NAS filer.
Deduplication can reduce backup size by 10:1 or 20:1 without significantly affecting the performance of restores and copies from disk to tape. But not all data dedupes well. Applications such as imaging, audio, video or seismic processing systems generate new data every time they run, so there's little detectable duplication. Dedupe systems also use compression, but not all data compresses well either.
There are other significant differences among target dedupe systems (VTLs/IDTs). The IBM Corp. ProtecTIER product, for example, has a single-stream restore speed limitation of approximately 90 MBps. Although Quantum has made significant progress with restore speed, the restore speeds from their "block pool" (i.e., deduped data) are still nowhere near those possible when restoring from the last few backups stored in native format. Sepaton Inc.'s dedupe system is backup product-specific, and the firm has yet to release support for CA ARCserve Backup, CommVault Simpana, EMC Corp. NetWorker and Symantec Corp. Backup Exec, among others. And the lack of global deduplication from some of the major vendors (e.g., Data Domain, NetApp and Quantum Corp.) means that users must continue to slice their backups into chunks that are manageable by a single appliance.
Deduplicated replication. Deduplication also makes replication much more affordable and feasible. Without dedupe, you might need 10 times to 100 times more bandwidth to replicate a full backup. With dedupe, a typical full backup only stores and replicates 1% to 10% of its native size.
Tape consolidation and virtualization
Some vendors, notably Fujitsu and Tributary Systems, tend to use the term tape virtualization rather than VTL. They see tape virtualization as a way to enhance your continued use of tape while removing many of tape's limitations, especially if you want to use tape as a long-term storage device. If you store data on tape for multiple years, you're supposed to occasionally "retension" your media and move backups around to keep all the bits fresh. Updating your tape technology is another issue: What do you do with the old tapes and drives?
A tape virtualization system solves these issues by employing what's often referred to as a hierarchical storage management (HSM) system for tape. Newer backups are stored on disk; older backups are stored on tape. When you buy new tape drives and bigger tapes, you simply tell the tape virtualization system that you want to retire the older tapes and they're migrated to the newer, bigger tapes by stacking the smaller tapes onto the larger tapes and keeping track of which "tapes" are stored on which tapes. If the backup application requests a bar code that's been stacked onto a bigger tape, the system loads the appropriate tape, positions to the point in the physical tape where the requested "tape" resides, and the application doesn't know the difference.
The future of virtual tape library technology
Virtual tape library technology continues to develop and expand, but just being a VTL may not be enough anymore. With so many users replicating backups offsite, the industry must find a solution to the challenges posed by using replicated backups. Unfortunately, in the near term we're likely to see more product-specific approaches such as Symantec's NetBackup OpenStorage and Hewlett-Packard's Data Protector/Virtual Library System.
You must have Adobe Flash Player 7 or above to view this content.See http://www.adobe.com/products/flashplayer to download now.
Download for later:
For more info on dedupe: Podcast on global dedupe FAQ
• Internet Explorer: Right Click > Save Target As
• Firefox: Right Click > Save Link As
There have also been predictions that as data deduplication becomes more pervasive in backup software, the need for intelligent disk targets will be reduced. But that's only likely to happen if source deduplication software products can address their restore speed limitations, which were designed to back up remote sites. As such, their restore speeds are slow (10 MBps to 20 MBps). Unless that changes, there will continue to be a market for high-speed disk targets.
This article was previously published in Storage magazine.
W. Curtis Preston (a.k.a. "Mr. Backup"), Executive Editor and Independent Backup Expert, has been singularly focused on data backup and recovery for more than 15 years. From starting as a backup admin at a $35 billion dollar credit card company to being one of the most sought-after consultants, writers and speakers in this space, it's hard to find someone more focused on recovering lost data. He is the webmaster of BackupCentral.com, the author of hundreds of articles, and the books "Backup and Recovery" and "Using SANs and NAS."