Data Backup.com

Understanding data deduplication ratios in backup systems

By Lauren Whitehouse

The effectiveness of data deduplication is often expressed as a deduplication or reduction ratio, denoting the ratio of protected capacity to the actual physical capacity stored. A 10:1 ratio means that 10 times more data is protected than the physical space required to store it, and a 20:1 ratio means that 20 times more data can be protected. Factoring in data growth, retention and assuming deduplication ratios in the 20:1 range, 2 TB of storage capacity could protect up to 40 TB of retained backup data.

How are these data deduplication ratios determined? The rate is calculated by taking the total capacity of data to back up (i.e., the data that will be examined for duplicates) and dividing it by the actual capacity used (i.e., the deduplicated amount of data).

What's a realistic data dedupe ratio?

But what is a realistic data deduplication ratio? The Enterprise Strategy Group (ESG) research found that, of respondents currently using data deduplication technology, approximately one-third (33%) said they have experienced a less than 10 times reduction in capacity requirements; 48% report a 10 times to 20 times reduction, and 18% report reductions ranging from 21 times to more than 100 times.

Several factors influence deduplication ratios, including:

Deduplication rates can be confusing. Some vendors express reduction as a percentage of savings instead of a ratio. If a vendor cites a 50% capacity savings, it's equivalent to a 2:1 deduplication ratio. A ratio of 10:1 is the same as 90% savings. That means that 10 TB of data can be backed up to 1 TB of physical storage capacity. A 20:1 ratio increases the savings by only 5% (to 95%).

Evaulating a dedupe product

When evaluating data deduplication, it's important to trial vendors' products in your environment with your own data over several backup cycles to determine a product's impact on your backup/recovery environment. The focus of selecting a product should be less on reduction ratios as a decision factor. ESG research (ESG Research Report, "Data Protection Market Trends," January 2008) found that, not surprisingly, the cost of the deduplication solution was the most frequently cited factor (although savings garnered from capacity reduction often overcome financial objections to deploying deduplication). Otherwise, the survey data suggests that ease of deployment and ease of use, as well as the impact on backup/recovery performance were important considerations -- more so than technical implementations, such as the deduplication ratio.

About this author:
Lauren Whitehouse is an analyst with Enterprise Strategy Group and covers data protection technologies. Lauren is a 20-plus-year veteran in the software industry, formerly serving in marketing and software development roles.

11 May 2009

All Rights Reserved, Copyright 2008 - 2024, TechTarget | Read our Privacy Statement