Manage Learn to apply best practices and optimize your operations.

Using data deduplication with backup applications: Source vs. target dedupe

In W. Curtis Preston's latest column, read about the latest data deduplication battle with backup apps and about source vs. target dedupe.

Your data backup software company wants some of Data Domain's revenue -- seriously. Backup software companies didn't see the intelligent disk target (IDT) market coming. The next thing they knew, companies like Data Domain were making millions of dollars a year selling such devices. Then the independent software vendors (ISVs) that make backup software started having the same thought: "If we offered dedupe for regular backups, customers would pay the data deduplication premium to us instead of to those appliance companies." And a line in the sand was drawn.

Source deduplication is your friend

The IDT vs backup software battle is just beginning, and this article will include a description of the products that have entered the battle; however, first we should discuss the battle that's completely over: backup of small amounts of data coming from remote sites. In this fight for your storage dollars, source deduplication has won hands down. Whether you're backing up a single home computer with your personal data, hundreds of remote users with laptops, or many remote offices with less than a terabyte of data each, source dedupe is your friend.

Without source dedupe, backups of smaller data sets and remote data sets can be quite challenging. Home users have historically used free products that are included with their OS or USB drive. Remote offices typically use something like Symantec Corp. Backup Exec and a DAT drive. Only the most conscientious laptop users have any kind of backup plan at all other than occasionally copying their data to a server that gets backed up. All of these methods are fraught with problems and suffer most from human error.

Installing a source dedupe product on these systems allows them to back up to a source dedupe server over a WAN connection -- completely automating this most important business function. They can back up to a source dedupe server managed by the IT department, or to a cloud backup service managed by an outside company.

The reason that source dedupe allows you to back up large amounts of data over such a small connection is that a source dedupe product communicates with the source dedupe backup server to identify and transmit only the blocks that are new. They start by asking the file system for the files that have changed since the last backup, then they examine each file that is to be backed up for blocks that have changed. This method of backup is obviously very well suited for remote data or mobile data.

Cloud backup and source dedupe

One interesting way that some companies can begin using source dedupe is to use a cloud backup provider that will manage the backups for them. All they have to do is install the cloud backup provider's software on their servers and start backing up to the cloud service. There's no backup server to install or manage. The only challenge some companies may have is getting the first backup done, since the first backup obviously has to send all the blocks. Some cloud providers offer a "seeding" option where they ship you a disk drive that you back up to locally and then ship back to them. They copy this backup to their servers, thus "seeding" your initial full. Once that has been done, your servers only have to back up the blocks that have changed since you backed up to the seeding system.

Target deduplication

Where source dedupe is perfect for smaller, remote data sets, target deduplication is meant for larger datasets where you have essentially unlimited bandwidth between the backup client and the backup server. This is the market that the appliance vendors have focused on, and some have done quite well selling you appliances that will ingest native, un-deduped backups and dedupe them for you. That's what made backup software companies sit up and take notice.

�In this fight for your storage dollars, source deduplication has won hands down.

The first company to make a move was Symantec. They took NetBackup PureDisk (a source dedupe product) and moved it inside the media server, allowing it to receive and dedupe regular NetBackup backups. The media server dedupes the data inline as it receives the data, and the deduped data is sent over IP to a PureDisk server.

IBM Corp Tivoli Storage Manager (TSM) followed with TSM server dedupe in TSM 6.1. TSM's implementation is a post-process implementation that looks at backups that have been sent to a disk-type device and dedupes them after the fact. CA announced similar capability for its ARCserve Backup product.

CommVault Systems Inc. is the latest vendor to enter the fray with its media agent dedupe option. Backups that are sent to a Simpana media agent are deduped inline before they are stored on disk. If you wanted to back up a remote site using this feature, CommVault says a media agent in a remote site could write deduped data to a CIFS share that was mounted from the central site.

Both CommVault and Symantec are making claims that you should use their dedupe software instead of buying a dedupe appliance, although CommVault's claims tend to be a little bolder. (Dave West, vice president of marketing and business development at CommVault, wrote in his blog that he sees no use case where any CommVault customer would need to buy a dedupe appliance.)

Can source dedupe or target dedupe from your backup application meet your needs? That will depend on your environment. We definitely see cases where a company's throughput requirements for target dedupe can only be met by an appliance. But there are also plenty of cases where either will work and it's simply a matter of negotiating over price. Just be sure to perform a proof of concept of any vendor claims before signing that check.

About the author:
W. Curtis Preston (a.k.a. "Mr. Backup"), Executive Editor and Independent Backup Expert, has been singularly focused on data backup and recovery for more than 15 years. From starting as a backup admin at a $35 billion dollar credit card company to being one of the most sought-after consultants, writers and speakers in this space, it's hard to find someone more focused on recovering lost data. He is the webmaster of, the author of hundreds of articles, and the books "Backup and Recovery" and "Using SANs and NAS."

Dig Deeper on Data reduction and deduplication

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.