Three or four years ago, there was something of a deduplication arms race going on. Vendors competed fiercely to achieve the highest possible deduplication ratio. Just as one vendor would advertise a 20:1 ratio, another would issue a press release stating it had achieved a 50:1 ratio.
By submitting your personal information, you agree that TechTarget and its partners may contact you regarding relevant content, products and special offers.
Today, these figures are hardly even worth considering. For all practical purposes, vendor-advertised deduplication ratios have become meaningless for two main reasons:
- When a vendor states it can achieve a 50:1 deduplication ratio, that number indicates a best-case situation. In the real world, deduplication ratios are often much lower than what vendors advertise as being possible. Remember, deduplication works by removing redundant data. If no redundant data exists, then deduplication is impossible. Some types of data are already compressed and therefore contain very little redundancy. This is especially true of media files such as MPEG videos or JPEG images.
- As the ratio increases, the data deduplication process yields diminishing returns. For example, if you deduplicate 1 TB of data, a 2:1 deduplication ratio (which is very low) eliminates half the data (512 GB). By the time you get to a 20:1 ratio, 95% of the data has been eliminated and your 1 TB of data has been reduced to a mere 51.2 GB. If you increase the deduplication ratio to 25:1, there is not much more data that can be eliminated because most of the redundancy has been removed. Moving from a 20:1 to 25:1 ratio only reduces the data by another 1% and the data volume by approximately 10 GB, which is insignificant compared to the original 1 TB of data. The data reductions become increasingly insignificant as the deduplication ratios get larger.
Vendor deduplication ratio claims vary widely
Guidelines on deduplicating disk backup storage
Deduplication key for backup and primary storage
Related Q&A from Brien Posey
Having a strategy to back up SAP HANA is a must. It's important to decide exactly what you'll be backing up, along with which method best suits your ...continue reading
Picking an NVMe drive is an important decision. Consider thermal control, proprietary software and drive architecture to make the right choice.continue reading
While data compression can effectively reduce space, be careful with how you use it, because the three issues outlined here could cause problems in ...continue reading
Have a question for an expert?
Please add a title for your question
Get answers from a TechTarget expert on whatever's puzzling you.