Global deduplication eliminates bottlenecks through load balancing

This is the seventh part of a nine-part series on deduplication. For easy access to all nine parts, check out our quick overview of Deduplication 2013.

    Requires Free Membership to View

Source and target deduplication each offer advantages and disad­vantages. One problem that both technologies have in common, how­ever, is that of scalability.

Imagine, for example, that an organization has a disk-based backup system and needs to cre­ate deduplicated backups of five file servers. There are a number of different ways that such a backup could be accomplished. One option might be to perform source dedu­plication of each server, so that the data is deduplicated prior to being written to the backup target.

One of the problems with this approach is that deduplication is occurring on a per-file-server basis. The data on each file server is being deduplicated, but it is likely that two or more servers could also con­tain duplicate data. Using this type of backup, cross-server duplicate data is not removed as a part of the deduplication process. Hence, the backup target could end up stor­ing duplicate data even though the individual servers have been dedu­plicated.

One especially popular solution to this problem is source + target deduplication. In this situation, the data would be deduplicated on the individual file servers, but there would also be an inline deduplica­tion process that runs on the backup target as a way of making sure that no redundant data is stored in the backups.

While the prospect of source + target deduplication might sound promising, there is one major dis­advantage. Source + target dedu­plication solutions tend not to scale very well. Often, the sheer volume of data that needs to be backed up causes the backup target’s control­ler to become a bottleneck.

Global deduplication is similar to source + target deduplication in that data is deduplicated at the source and again at the backup target. However, global deduplication solu­tions attempt to eliminate bottle­necks through load balancing.

A backup target that is designed for global deduplication typically presents itself to the backup servers as a single pool of storage. When the backup process begins, how­ever, the inbound data is dynami­cally load-balanced across multiple controllers. These controllers help divide the workload, thus allowing more data to be deduplicated than would be possible using a sin­gle controller.

Examples of global deduplication

Because global deduplication solu­tions are specifically designed to address the scalability shortcom­ings of more traditional deduplica­tion solutions, global deduplication tends to be implemented primarily in large data centers. Global dedu­plication products are mainly avail­able from vendors that offer enter­prise-class storage products.

One of the leaders in global dedu­plication is EMC Corp., which offers a product called the EMC Data Domain Global Deduplication Array. The EMC Global Deduplication Array load-balances the deduplica­tion process across two high-end Data Domain controllers.

The EMC Global Deduplication Array is designed specifically to work with the EMC Data Domain Boost software and EMC Network­er, Symantec NetBackup or Syman­tec Backup Exec. However, organi­zations that use a different backup application can use the Global Deduplication Array by connecting it to the existing backup infrastruc­ture as a virtual tape library.

CommVault Systems Inc. also offers global deduplication, but takes a different approach than EMC. While EMC's global dedu­plication is based on the use of a hardware appliance, CommVault takes a software approach to global deduplication.

In Simpana 9, the deduplication process begins at the data source by leveraging the backup client software. This first step reduces the amount of data which must be transferred across the network. CommVault also uses media agents as secondary deduplication points. By using a software-only approach to deduplication, CommVault pro­vides organizations with the flexibil­ity to use the type of backup storage that makes the most sense for them (DAS, NAS, SAN, etc.), as opposed to being locked into using a hard­ware appliance.

Another major player in the global deduplication space is Symantec Corp. Like EMC, Symantec uses a hardware appliance in their global deduplication strategy. Symantec's global deduplication solution is based around the NetBackup 5000 series appliance and uses both source-side and target deduplica­tion.

Source deduplication is per­formed by the Symantec NetBackup client, while the NetBackup appli­ance handles target deduplication.

Data can be replicated from the NetBackup 5000 to another Net­Backup appliance as a way of pro­tecting the storage pool's contents. Deduplication is also used in the rep­lication process, ensuring that any data that already exists on the rep­lica are not needlessly transmitted.

Learn who should consider global deduplication in part eight of our series.

About the Author:
Brien M. Posey, MCSE, has received Microsoft's MVP award for Exchange Server, Windows Server and Internet Information Server (IIS). Brien has served as CIO for a nationwide chain of hospitals and has been responsible for the department of information management at Fort Knox. You can visit Brien's personal website at www.brienposey.com.

This was first published in April 2013

There are Comments. Add yours.

TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to: