This article is part of an Essential Guide, our editor-selected collection of our best articles, videos and other content on this topic. Explore more in this guide:
2. - How deduplication is performed: Read more in this section
- The benefits of deduplication and where you should dedupe your data
- Post-process vs. inline deduplication and more
- Software versus hardware backup deduplication
Explore other sections in this guide:
- 1. - Backup deduplication technology today
- 3. - How deduplication is used today
- 4. - Data deduplication challenges
Besides source and target dedupe, there is also post-process and inline deduplication as well as fixed and variable-block length deduplication. What are the pros and cons to these different approaches?
Each of these approaches has its own set of advantages and disadvantages. Post-process deduplication requires a larger back-end storage pool than inline deduplication, but it also gives you the choice of deduplicating certain workloads and not others. In addition, post-process deduplication gives you the ability to rapidly recover the most recent backup set without rehydrating, a process that usually slows recoveries down to 80% of the backup speeds.
A similar tradeoff exists for block lengths: Algorithms that use variable-block length deduplication are usually slower and produce more metadata, but achieve better compression ratios than fixed-block length algorithms, which are less compute-intensive.
A less-known third type of block hashing called sliding-window is also picking up steam. It can intelligently hash data into different block sizes, depending on the application type, and can better tolerate inserts, changes and metadata than other types of hashing algorithms.