Home > The downsides of data deduplication technology explained
Feature:
EMAIL THIS

The downsides of data deduplication technology explained

01 Sep 2008 | Rick Cook

Data backup technical tips
Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us    Add to Google

Data deduplication companies like Data Domain Inc., EMC Corp. and ExaGrid Systems are reporting record growth as users come to understand the value proposition of data dedupe and adopt it widely. But that doesn't mean that data deduplication is completely mature or ready for general storage use. There are still some areas, such the lack of deduplication standards, where data deduplication has some growing to do. Similarly, data deduplication is best adapted to specific kinds of situations, notably data backups.

Deduplication involves checking each file or block being backed up and replacing any duplicates with pointers to a single copy. Because so much of what is stored in a business environment is redundant (e.g., emails to multiple recipients), the result can be a savings of 25 to 1 or more on storage space.

Data deduplication extracts a performance and resource penalty, especially when reconstructing a file.
So why wouldn't you use data deduplication for all kinds of storage? The reason is that data deduplication extracts a performance and resource penalty, especially when reconstructing a file. Generally the cost in speed and resources is high enough that data deduplication isn't attractive for ordinary storage. Backups are different because data is typically written once and read infrequently. The penalties associated with data deduplication are much less for a properly designed backup system.

The other major drawback to data deduplication is the lack of standards. Every major vendor does data deduplication its own way, and as a result files have to be reconstructed by equipment from the same vendor whose products were used to create the data deduplication file in the first place. Since the efficiency of deduplication algorithms -- which are mostly proprietary -- is a major competitive advantage, this isn't likely to change.

If you use a remote system for disaster recovery (DR), you need to make sure the devices that aren't deduped are available at the DR site, as well the people trained to use them. Data deduplication also raises the issue of vendor lock-in.

Target-based vs. source-based deduplication

There are two main methods of data deduplication currently in use: target- and source-based data deduplication.

In target-based data deduplication, deduplication is handled by a device such as a virtual tape library (VTL), which has data deduplication built in. Your backup software doesn't change and the data is deduped after it has been sent over the network. This lets you continue to use the balance of your backup system as it already exists, especially your backup software, but it doesn't do anything to save bandwidth. One way around the bandwidth issue, which becomes especially important to remote sites communicating over a WAN, is to have a VTL at the remote site, deduplicate the data there and transmit it to the backup server. If the cost of network capacity is high enough, and the cost of the VTLs is low enough, this can save you money. However, it requires have a data deduplication device at every site to be protected.

In source-based deduplication, the deduplication is done by the backup software. The software on the clients communicate with software on the backup server to check each file or block to see if it is a duplicate. Duplicates are replaced by pointers before the data is sent to the server.

Source-based deduplication conserves bandwidth without the need to have extra hardware at the source. However, it requires a lot of extra communication between the server and the clients since each piece of data (block or file) has to be checked against the server's list of already-present pieces of data.

Which system is faster depends very much on the specifics of the installation. If you have a lot of data, say multiple terabytes, target deduplication is usually faster, but if you have smaller amounts of data, other factors such as network performance overwhelm the differences.

About this author: Rick Cook specializes in writing about issues related to storage and storage management.

Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us    Add to Google



RELATED CONTENT
Data reduction and deduplication
Data archiving reduces data backup workload prior to data deduplication
Arkeia takes aim at EMC Avamar with Kadena Systems data deduplication IP buy
Data backup and recovery news briefs: Druvaa Software updates flagship product, releases inSync v3.1
Data backup and recovery vendors dig into deduplication technology, aim for cloud backup
Data backup and recovery news briefs: Data Domain upgrades data deduplication appliances
Using data deduplication with backup applications: Source vs. target dedupe
Quantum launches midrange data deduplication backup appliances
Data deduplication software trends in backup and recovery
BakBone phasing out virtual tape library, adds data deduplication with NetVault Backup 8.5
EMC's Slootman: No data deduplication for Disk Library virtual tape library

Data storage backup tools
Data backup and recovery news briefs: Thales Group releases CryptoStor Tape 3.0 appliance
Data archiving reduces data backup workload prior to data deduplication
Symantec releases Linux version of Backup Exec System Recovery
Data backup and recovery news briefs: Druvaa Software updates flagship product, releases inSync v3.1
SQL Server data backup and recovery best practices
Data backup and recovery vendors dig into deduplication technology, aim for cloud backup
Veeam integrates with VMware vStorage APIs in Backup and Replication 4
Data backup and recovery news briefs: Data Domain upgrades data deduplication appliances
Double-Take replication software solves remote-office data backup headache for Lennox International
Using data deduplication with backup applications: Source vs. target dedupe

RELATED RESOURCES
2020software.com, trial software downloads for accounting software, ERP software, CRM software and business software systems
Search Bitpipe.com for the latest white papers and business webcasts
Whatis.com, the online computer dictionary




Data Backup Solution Categories
About Us  |  Contact Us  |  For Advertisers  |  For Business Partners  |  Site Index  |  RSS
SEARCH 
TechTarget provides technology professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective purchase decisions and managing their organizations' technology projects - with its network of technology-specific websites, events and online magazines.

TechTarget Corporate Web Site  |  Media Kits  |  Site Map




All Rights Reserved, Copyright 2008 - 2009, TechTarget | Read our Privacy Policy
  TechTarget - The IT Media ROI Experts