This is the fourth part of a nine-part series on deduplication. For easy access to all nine parts, check out our quick overview of Deduplication 2013.
Unlike tape backups, which can span as many tapes as might be required, disk-based backup systems have a limited capacity. Deduplication has traditionally been used as a way of reducing the disk space required to store disk-based backups.
Early implementations of deduplication for disk-based backups were target-based. The data was copied from the source servers to the backup target and was either deduplicated as it was written to the target (inline deduplication) or at a later time (post-process). Part 4 of our series on deduplication in 2013 discusses target deduplication.
Target-based deduplication continues to evolve
Although target deduplication is nothing new, the technology continues to evolve. At one time, target deduplication consisted of little more than shifting the deduplication process to the backup's storage. Today, however, target deduplication has become more sophisticated and more efficient.
One example of this is EMC's DD Boost (short for Data Domain Boost), which is designed to increase backup performance by taking a distributed approach to the deduplication process. Rather than relying solely on inline or post-process deduplication on a storage appliance, DD Boost distributes the deduplication process so that no one single device has to carry the full burden of the deduplication workload.
With DD Boost in place, much of the deduplication process occurs on the backup server (or application clients). This approach means that only unique data is actually sent to the Data Domain system. Of course, this approach requires the backup server to run an application that is compatible with DD Boost. This, however, is where target deduplication is really evolving. When DD Boost debuted in 2010, it was only compatible with Symantec's NetBackup and BackupExec. Today, however, there are a number of different backup vendors that support DD Boost. DD Boost also works with products such as EMC Avamar, EMC NetWorker, EMC Greenplum, Oracle RMAN and Dell Quest vRanger.
DD Boost provides two main benefits. First, it minimizes the amount of data that must be sent to the Data Domain system. Because much of the deduplication process occurs on the backup server, only unique data is sent to the Data Domain. This improves throughput and decreases the amount of traffic that must be transmitted, which is particularly advantageous when backups are being written to (or replicated to) an off-site Data Domain.
The other advantage to this approach is that DD Boost actually reduces CPU consumption on the backup server. Although deduplication is a somewhat CPU-intensive process, EMC claims that "sending data over the network is significantly more CPU-intensive than the distributed deduplication process," and that reducing CPU consumption on the backup server should allow it to handle backup jobs more efficiently or to possibly run additional concurrent backup jobs.
There are, of course, many products available to perform target-based deduplication. DD Boost is just one example of how the technology is changing and improving today.
Part 5 of our series on deduplication in 2013 focuses on source-based deduplication.
Brien M. Posey, MCSE, has received Microsoft's MVP award for Exchange Server, Windows Server and Internet Information Server (IIS). Brien has served as CIO for a nationwide chain of hospitals and has been responsible for the department of information management at Fort Knox. You can visit Brien's personal website at www.brienposey.com.
This was first published in April 2013