Who is using deduplicating disk arrays today and what are the latest dedupe tools in that space?
Deduplication arrays tend to be marketed toward enterprise-class organizations. Many of the deduplication array tools that are currently on the market are specifically designed for enterprise-class workloads. Such devices may be able to process several TB of data each hour and have capacities measured in PB. Needless to say, this type of scalability is overkill for the SMB market. Furthermore, deduplication arrays usually also come at a price point that is beyond the reach of all but the largest organizations.
Probably the biggest development in dedupe tools is that of deduplication acceleration. Deduplication acceleration is based on the principle of workload distribution. Rather than allowing a single device to handle the full burden of the backup deduplication workload, the workload is distributed across multiple devices and can therefore be performed more quickly thanks to parallel processing.
One of the problems with deduplication arrays has always been that they work great in single-tier environments but not so well with backup architectures that rely on multiple storage tiers. Tiered backup-and-archiving systems typically move data from one storage tier to the next as the data ages. For example, an organization might store its freshest backup data on spinning disks but move aging backup data to tape or to cloud storage.
Moving deduplicated data from one storage tier to another typically requires the data to be rehydrated prior to the move. Deduplication accelerators make it possible to move deduplicated data among storage tiers without the need for rehydration.
This was first published in December 2013