Software-only deduplication approaches
Client-side software products
"Target" or hybrid software products
Asigra Inc. Televaulting
CommVault Simpana v. 8
EMC Corp. Avamar
IBM Corp. Tivoli Storage Manager (TSM) v. 6
Symantec Corp. PureDisk
Symantec Corp. PureDisk
Client-side software like Symantec's PureDisk is designed for remote offices or network-bound environments where users want to cut the amount of data they're sending over the wire.
To back up its remote offices, the Iowa Dept. of Revenue used a Symantec Backup Exec agent at seven remote field offices to send tape jobs over the wide-area network (WAN) to the main data center. This required 60 different tape jobs at the main data center to complete backups for 200 GB of data per week, according to senior network engineer Mark Wise.
"The amount of data going over that network was just too much," Wise said. "One of the sites was taking 26 hours to do a full backup. When you're dealing with remote offices and remote networks, the network has more latency and interruptions to communication. The Backup Exec agents were timing out and failing."
In late November 2007, the department deployed PureDisk, bringing the backup window down from up to 26 hours to an average of 30 minutes.
CommVault, which added dedupe to its Simpana software suite, argues that embedding dedupe in the core backup application can lead to speedier recovery times and better catalog consistency because the same catalog is used to track dedupe. Symantec addresses catalog consistency with the OST approach, but PureDisk uses a separate catalog than NetBackup.
When dedupe is embedded with backup, it gives customers one throat to choke and lower costs than deploying a separate product. Furniture retailer Rooms to Go has used CommVault to back up approximately 25 TB of data at its central data center in Sefner, Fla., for five years. The retailer used CommVault's single-instance storage (SIS) for file-level dedupe prior to the release of Simpana 8.
"We were really anxious to at least get the file-level dedupe," said Jason Hall, director of IT systems for Rooms to Go.
Hall said he was willing to stick with SIS until CommVault added subfile dedupe to Simpana 8.
"If we were backing up 100 TB a night or something, we'd probably have been more eager," he said. "But our full backups are about 5 TB -- we weren't anxious about it."
Others will mix and match, adding software specifically for dedupe even if using another vendor's backup software. One storage architect at a large telecom who asked not to be identified because he is not authorized to endorse products his firm uses, is dealing with petabytes of data, and added EMC Avamar for approximately 100 TB of file and virtual server backups while using NetBackup and replication to backup higher tiers of applications.
It's unusual for such a large shop to deploy Avamar, especially when it has thousands of hosts and performance-intensive applications. But "any backup process adds load to the CPU client," the architect said, and when it comes to file shares and multiple copies of operating systems on less mission-critical servers, "there's a huge opportunity for dedupe. We're into the seven figures in savings with Avamar compared to tape -- we've saved more than we've spent."
The telecom is in the process of upgrading to Avamar 4.1, which will support twice as much capacity per grid node as the previous version. "The size of the grid defines the dedupe domain, and we can't get any commonalities between the grids," he said. Boosting the capacity within the grids will alleviate that problem at least partially. But going forward, "that'll be our biggest challenge -- how to manage bigger buckets in the dedupe domain. It won't keep us from doing it, but it also won't have the same amount of value for us that it could."
One drawback to software-based approaches is most require customers to replace the current backup software. And not all of the dedupe software apps support every backup product. For example, Avamar cannot use EMC's own disk libraries as a target. PureDisk is not integrated with Symantec's SMB backup product, Backup Exec, though Mohan said this is on the roadmap.
Software-based approaches require organizations to carefully evaluate tradeoffs when it comes to adding a process within an existing infrastructure, according to ESG analyst Whitehouse. Inline backup-server based approaches such as CommVault's Simpana or IBM's Tivoli Storage Manager (TSM) can avoid adding processing load to the client server while still reducing at least part of the network traffic. NetBackup PureDisk can also be deployed at the backup server or disk target level if the customer chooses.
"It's easy to go and shop for a hardware target," ESG's Whitehouse said. "Users need to more carefully consider how the various software-based inline dedupe methods distribute the CPU load among clients."