An emerging trend in data deduplication software is a movement toward deduplicating data at the source, or the client server that hosts the application.
CA ARCserve, CommVault Simpana and Symantec Veritas NetBackup PureDisk currently offer data deduplication at the backup server level, reducing network traffic between the backup server and the backup target, but not between the client and the backup server. NetBackup PureDisk has had the ability to deduplicate data at the source since before Symantec picked up the deduping IP when it acquired Data Center Technologies in 2005, but Symantec has only announced it will deduplicate data from the source when PureDisk melds with its NetBackup and Backup Exec apps over the next six months. EMC's Avamar has performed source-based data deduplication since it was founded as a startup in 1999, and was acquired by EMC Corp. in 2006. This year, EMC expanded the integration between Avamar and its existing Networker backup software to include source-based dedupe.
Acronis users have the option of deduplicating data from either the source or the backup server level. Barracuda's recent integration of BitLeap data dedupe IP (acquired in November 2008) with Yosemite's backup software app agents adds application-aware dedupe at the source level. IBM remains the outlier in this regard -- its dedupe is offered post-process, at the backup target.
Whether and how much vendors are charging for adding dedupe to existing backup software is another differentiator for some offerings; Acronis, CommVault and Symantec charge for the feature. CA and IBM customers get dedupe free of charge, while Barracuda claims Yosemite's $1,500 unlimited server backup license keeps its offering competitive.
Lauren Whitehouse, an analyst at Milford, Mass.-based Enterprise Strategy Group (ESG), said global data deduplication is the next frontier for software and target device dedupe vendors alike. "Everybody's low on the maturity curve here," Whitehouse said. For CommVault, dedupe is global only within the same policy group (though CommVault argues this is sufficient for most customers), while dedupe is global only among system files for CA; application data is deduped separately within each backup server.
Arun Taneja, founder and consulting analyst at Hopkinton, Mass.-based Taneja Group, said these differentiators among backup products will ultimately be temporary, as data deduplication moves out of backup toward primary storage and application hosts. In five years, he predicted, "the benefits of deduplication on the primary storage side will flow right through to the back end -- there won't be this microscopic focus on the backup and archiving world."
About this author: Beth Pariseau is the Senior News Writer for the Storage Media Group.
This article originally appeared in Storage magazine.