Data deduplication tutorial

This data deduplication tutorial offers advice on data dedupe products, inline vs. post-processing deduplication, VTLs and dedupe, and the latest news.

by Russ Fellows

If you've decided that your data backup system can benefit from data deduplication, you definitely have plenty of choices. But first you need to figure out where and how to implement dedupe. There are several data backup products that incorporate data deduplication. Some are virtual tape library (VTL) products, others are network-attached storage (NAS) that may be used as a backup target, and still others are backup applications.

In this tutorial, we look at post-processing versus inline deduplication, disk-based backup and dedupe, and compare the popular deduplication products.

A look at post-processing deduplication

For backup/storage administrators looking minimize the time it takes to back up their data, the best option is often to use a post-process method. This has the advantage of backing up data faster, reducing the backup window. The disadvantage of this method is that additional storage space is consumed. Backup data is sent to a temporary holding area in order to speed the backup process. Once that completes, the data is reexamined for duplicates, with duplicate data removed (some post systems start deduping before the whole backup is complete, so they may not require as much storage on the target).

Editor's Tip: It's also important to know how data dedupe impacts the recovery process -- specifically, how rapidly you can recall data for restoration. Read this article to learn about restoring deduped data.

Inline deduplication

An alternative to deduplicating data after a backup is to perform deduplication inline as data is being sent to the backup device. The advantage with this method is that no extra space is required. Another advantage is that once the data is deduplicated and stored, the process is done, and backup data may be replicated to offsite storage. With post-processing deduplication, data must be written to storage, then deduplicated at a later time, and then replicated to offsite storage. As a result, the time to complete the entire backup process -- including replicating to offsite systems -- can be longer than systems that deduplicate inline.

Editor's Tip: There are several kinds of backup products that incorporate data deduplication, and VTLs are one example. Read this article to learn everything you need to know about VTLs and deduplication.

Disk-based backup systems and deduplication

Data deduplication can dramatically decrease the amount of disk space required for backup data, while retaining the significant performance improvements that disk-based backup devices have over tape. Thus, disk-based backup targets, whether they are NAS devices or VTLs, allow these systems to deliver high service-level objectives, while remaining cost competitive with tape-based systems.

Editor's Tip: 2008 was a big year for disk-based backup and recovery. Read this article for a look at the latest disk-based backup and recovery trends.

A comparison of deduplication product offerings

There are several vendors that deliver products that incorporate data deduplication. Provided below is a comparison of vendors, products and features.

Product Simpana 8 DDX Avamar DL 4000 SIR VLS Diligent
Vendor Comm-Vault Data Domain EMC Corp. EMC Corp. FalconStor Software HP Co. IBM Corp.
Deployment type Backup software VTL w/ storage Backup software VTL appl. w/ storage VTL w/ or wo storage VTL appl. w/ storage VTL appl. w/ or wo storage
Dedupe cost Add-on Included Included Add-on Add-on Add-on Included
When Dedupe Inline Inline Inline Inline and post process Post process Post process Inline
Dedupe location Distributed Target Source Target Target Target Target
Chunk size Variable Variable Variable Variable Variable Variable Variable
Access method - - Hardware dependent - - - -
NAS (NFS/CIFS) Yes Yes - No No No No
FC primary storage No No - No No No No
FC tape storage (VTL) No Yes - Yes Yes Yes Yes
iSCSI primary storage No No - No No No No
iSCSI tape storage (VTL) Yes Yes - Yes Yes Yes Yes


Product HydraStor DataRedux FAS DeDupe Enterprise Archive DXi S2100 w/ DeltaStor VTL Prime PureDisk
Vendor NEC Corp. NetApp Permabit Quantum Corp. Sepaton Inc. Sun Inc. Symantec Corp.
Deployment type Secondary storage Primary storage Secondary Storage VTL appl. w/ storage VTL w/ or wo storage VTL appl. w/ storage Backup software
Dedupe Cost Add-on Included (No cost license) Included Add-on Add-on Add-on Included
When Dedupe Inline Post Process Inline Both (Inline and post process) Post Process Post Process Inline
Dedupe location Target Target Target Target Target Target Source
Chunk Size Variable 4 KB block Variable Variable Variable Variable Variable
Access Method - - - - - - Hardware dependent
NAS (NFS/CIFS) Yes Yes Yes Yes No No -
FC primary storage No Yes No No No No -
FC tape storage (VTL) No No No Yes Yes Yes -
iSCSI primary storage No Yes No No No No -
iSCSI tape storage (VTL) No No No Yes Yes Yes -

The future of data dedupe

It is likely that over time, data deduplication will become a service and be offered as a feature in conjunction with multiple product types and deployment scenarios. Until this time, you must carefully evaluate their cost, performance and data retention goals prior to choosing a data deduplication product that will deliver the optimal benefits in their particular environment, or test the product carefully in your environment before you buy it.

Editor's Tip: Stay up to date with the latest deduplication and data reduction news. Bookmark our special section on data deduplication.

About the author

Russ Fellows is a Senior Analyst with the Evaluator Group. He is responsible for leading research and analysis of product and market trends for NAS, virtual tape libraries and storage security.

This was first published in January 2009

Dig Deeper on Data reduction and deduplication



Find more PRO+ content and other member only offers, here.



Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:



  • Flash technologies remain hot in 2016, experts predict

    Experts predict solid-state technology will remain hot in 2016, leading to the demise of high-speed hard disk drives, as ...

  • Tintri VMstore T5000

    Like all of its VM-aware storage systems, Tintri’s first all-flash array -- the Tintri VMstore T5000 -- allows admins to bypass ...

  • SolidFire SF9605

    The high-capacity SolidFire SF9605 uses SolidFire’s Element OS 8 (Oxygen) to deliver new enterprise features such as synchronous ...