This is the eighth part of a nine-part series on deduplication. For easy access to all nine parts, check out our quick overview of Deduplication 2013.
Global deduplication is most suitable for large data centers with rapidly growing data sets. These types of organizations are good candidates for global deduplication because of two factors -- the volume of data being backed up and the complexity of the backup operation.
These two factors make it important for large data centers to limit the amount of data that is transferred across the network as a part of the backup process and to limit the amount of data that is stored. Source-side deduplication helps to limit the volume of data transferred from the source to the target during the backup process, and target deduplication helps eliminate storage redundancy resulting from redundancy across hosts. In larger environments, however, these forms of deduplication alone might not be enough.
The reason for this is because large data centers typically back up numerous hosts and make use of many different backup targets. Simple source + target deduplication usually cannot ensure the removal of redundant data segments in a multi-source/multi-target environment. Even if such a product were able to manage deduplication across multiple sources and targets, the sheer volume of data being backed up in a large data center could cause the deduplication components to become a bottleneck.
In contrast, it is worth noting that global deduplication is usually overkill in all but very large environments. The cost and complexity of implementing a global deduplication solution make global deduplication impractical for smaller organizations. This isn't usually a problem, however, as source + target deduplication is normally adequate for addressing a smaller organization's needs.
Is global deduplication necessary?
The subject of whether global deduplication is a need-to-have capability or just something that is nice to have has been hotly debated among IT professionals. In fact, IT pros can't even seem to agree on the criteria for deciding when global deduplication is or is not needed.
When deciding whether or not to implement global deduplication, there are a number of factors that should be considered. The first consideration should be the volume of data being created each day or the volume of data that needs to be retained within online backups.
Generally, organizations that must regularly back up or retain large quantities of data are good candidates for global deduplication. However, the actual volume of data that needs to be backed up or retained in the backup system can often be greatly reduced by archiving static data, rather than including it in active backups. If, after archiving static data, your backups are still excessively large, then global deduplication might be a good fit. Overall storage capacities vary from one vendor to another, but EMC's Global Deduplication Array, for example, can store up to 14.2 PB of data.
Another consideration is the complexity of your backups. If your organization is currently performing source + target deduplication and you are finding that the backup and deduplication processes are becoming difficult to manage due to their complexity, then a global deduplication solution might be a worthwhile investment. Most global deduplication products use a single management interface for organization-wide deduplication of data.
Finally, your budget might be the biggest consideration of all -- you can expect them to come with a hefty price tag.
The ninth and final part of our series outlines a number of important things to consider before selecting a deduplication product.
About the Author:
Brien M. Posey, MCSE, has received Microsoft's MVP award for Exchange Server, Windows Server and Internet Information Server (IIS). Brien has served as CIO for a nationwide chain of hospitals and has been responsible for the department of information management at Fort Knox. You can visit Brien's personal website at www.brienposey.com.