What are the pros and cons of source and target deduplication?
With source deduplication, hashing and processing occurs on the client itself before data is transmitted over the network. Because deduplication occurs at the source, less data is transmitted over the network and ultimately stored. However, it does add some processing overhead on the client. How much overhead will vary by vendor, but it usually ranges from 15 to 25%. Source-based deduplication is especially useful in highly virtualized environments and branch-office environments where bandwidth is scarce, but it usually isn't suitable for high-transaction environments.
With target deduplication, hashing and processing occurs on a media server or a proxy server or on the disk appliance. Because deduplication occurs on the target side, it does not reduce the amount of data transferred from the client, but it does not add any processing overhead to the client, either.
Over the past few years, many enterprise backup software products have evolved to include both source and target deduplication. The deduplication rates among vendors depend on the type of algorithm used, but they tend to be very competitive. Usually, on average, I see companies getting around a 7:1 to 10:1 deduplication ratios, depending on backup schema, end-user patterns and data types. For example, companies that use "incremental forever" backups -- where they take one full backup at installation and, from then on, take only incremental backups -- see less dramatic deduplication ratios, often in the range of 4:1 to 6:1. This doesn't necessarily mean they are storing more data than their peers using a traditional weekly full, nightly incremental approach, since "incremental forever" backups are innately more space-efficient.
This was first published in October 2012