Definition

data deduplication ratio

A data deduplication ratio is the measurement of data's original size versus the data's size after removing redundancy.

Data deduplication is the process that removes redundant data before a data backup. The data deduplication ratio measures the effectiveness of the dedupe process. It is calculated by dividing the total capacity of backed up data before removing duplicates by the actual capacity used after the backup is complete. For example, a 5:1 data deduplication ratio means that five times more data is protected than the physical space required to store it.

When a vendor states it can achieve a certain deduplication ratio, that number indicates a best-case scenario. Because deduplication works by removing redundant data, if no redundant data exists, then deduplication is impossible. Some types of data, such as MPEG videos or JPEG images, are already compressed and contain little redundancy.

As the deduplication ratio increases, the dedupe process generates comparatively weaker returns. A 100:1 ratio eliminates 99% of the data. Increasing the ratio to 500:1, which eliminates 99.8% of data, would not reduce much more data since most of the redundancy has already been removed.

Several factors affect data deduplication ratios, including:

  • Data retention -- The longer data is retained, the greater the probability of finding redundancy
  • Data type -- For example, an environment with primarily Windows servers and similar files will likely produce a higher ratio

Independent backup expert W. Curtis Preston discusses several various approaches to implementing dedupe.

  • Change rate -- High data change rates often yield low deduplication ratios
  • Location -- The wider the scope, the higher the likelihood of finding duplicates. Global deduplication, which compares data across multiple systems, will usually generate more reduction than data deduped locally on one device

Virtual server environments often yield the best deduplication ratios, because so much data between virtual machines is redundant. Structured databases often produce poor deduplication ratios.

Variable chunking is effective in recognizing duplicate data and increases the data deduplication ratio through the ability to match smaller data more easily and quickly. On the other hand, legacy data protection software that only dedupes data across a single data stream results in lower dedupe ratios and increases the cost of data protection.

This was last updated in March 2016

Continue Reading About data deduplication ratio

Dig Deeper on Data reduction and deduplication

PRO+

Content

Find more PRO+ content and other member only offers, here.

Join the conversation

1 comment

Send me notifications when other members comment.

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Please create a username to comment.

What types of data provide high deduplication ratios for your organization?
Cancel

-ADS BY GOOGLE

File Extensions and File Formats

SearchSolidStateStorage

SearchCloudStorage

SearchDisasterRecovery

SearchStorage

SearchITChannel

Close