What types of data are not well suited for deduplication? What types of data dedupe well?
There are certain file types and application data that inherently won't deduplicate very effectively. Certain applications, such as Lotus Notes, simply do not yield high deduplication ratios. Structured databases also often yield poor deduplication ratios. Certain rich media file types will actually result in deduplicated output that is the same size or even sometimes larger than the original. Beyond that, anything that has a high change rate will result in low deduplication ratios.
On the flip side, companies often see very high deduplication ratios from applications that have data with a low change rate as well as NAS shares, where there are often significant amounts of redundant data stored.
Virtual server environments often yield the best deduplication ratios. Because so much data between virtual machines (VMs) is actually redundant, many firms see extremely high reduction rates in data when deduplicating VM backups.
This was first published in October 2012