Global deduplication: Who benefits from it?

In the eighth part of our series on deduplication in 2013, independent backup expert Brien Posey discusses who can benefit from global deduplication.

This is the eighth part of a nine-part series on deduplication. For easy access to all nine parts, check out our quick overview of Deduplication 2013.

Global deduplication is most suit­able for large data centers with rapidly growing data sets. These types of organizations are good candidates for global deduplication because of two factors -- the volume of data being backed up and the complexity of the backup operation.

These two factors make it impor­tant for large data centers to limit the amount of data that is trans­ferred across the network as a part of the backup process and to limit the amount of data that is stored. Source-side deduplication helps to limit the volume of data transferred from the source to the target dur­ing the backup process, and target deduplication helps eliminate storage redundancy resulting from redundancy across hosts. In larger environments, however, these forms of deduplication alone might not be enough.

The reason for this is because large data centers typically back up numerous hosts and make use of many different backup targets. Sim­ple source + target deduplication usually cannot ensure the removal of redundant data segments in a multi-source/multi-target envi­ronment. Even if such a product were able to manage deduplication across multiple sources and tar­gets, the sheer volume of data being backed up in a large data center could cause the deduplication com­ponents to become a bottleneck.

In contrast, it is worth noting that global deduplication is usually overkill in all but very large environ­ments. The cost and complexity of implementing a global deduplication solution make global deduplication impractical for smaller organiza­tions. This isn't usually a problem, however, as source + target dedu­plication is normally adequate for addressing a smaller organization's needs.

Is global deduplication necessary?

The subject of whether global dedu­plication is a need-to-have capabil­ity or just something that is nice to have has been hotly debated among IT professionals. In fact, IT pros can't even seem to agree on the criteria for deciding when global deduplication is or is not needed.

When deciding whether or not to implement global deduplica­tion, there are a number of factors that should be considered. The first consideration should be the volume of data being created each day or the volume of data that needs to be retained within online backups.

Generally, organizations that must regularly back up or retain large quantities of data are good candidates for global deduplica­tion. However, the actual volume of data that needs to be backed up or retained in the backup system can often be greatly reduced by archiving static data, rather than including it in active backups. If, after archiving static data, your backups are still excessively large, then global deduplication might be a good fit. Overall storage capacities vary from one vendor to another, but EMC's Global Deduplication Array, for example, can store up to 14.2 PB of data.

Another consideration is the complexity of your backups. If your organization is currently performing source + target deduplication and you are finding that the backup and deduplication processes are becom­ing difficult to manage due to their complexity, then a global deduplica­tion solution might be a worthwhile investment. Most global deduplica­tion products use a single manage­ment interface for organization-wide deduplication of data.

Finally, your budget might be the biggest consideration of all -- you can expect them to come with a hefty price tag.

The ninth and final part of our series outlines a number of important things to consider before selecting a deduplication product.

About the Author:
Brien M. Posey, MCSE, has received Microsoft's MVP award for Exchange Server, Windows Server and Internet Information Server (IIS). Brien has served as CIO for a nationwide chain of hospitals and has been responsible for the department of information management at Fort Knox. You can visit Brien's personal website at

Dig Deeper on Data reduction and deduplication

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.