Global deduplication allows users to deduplicate data across multiple boxes, which can simplify management as companies back up more and more data. W. Curtis Preston, executive
Table of contents:
>> What is global deduplication?
>> What are the benefits of global deduplication?
>> What are the drawbacks of global deduplication?
>> What vendors offer global deduplication products?
>> Who needs global deduplication?
Global deduplication comes into play when you have multiple deduplication devices. For example, if you are backing up to more than one target deduplication appliance, or in the case of source deduplication, multiple backup nodes that are also backing up multiple clients.
With global deduplication, when data that's seen before by one node is sent to a second node, the second node knows that data has already been stored and it won't be stored for a second time.
The idea is that, without any additional administration, it will increase the deduplication ratio. The argument against that, which of course is supplied by vendors that don't have global deduplication, is that there isn't a lot of commonality between say Oracle and Exchange. But, that's not the point at all. The idea behind global deduplication is that every time the Exchange database is backed up, it is compared against the Exchange database wherever it is backed up.
You can get that same effect with a device that doesn't have global deduplication, by splitting up your backups. Let's say you've got four deduplication appliances. And you point your Exchange backups to one of them, your Oracle to another, your file systems to another, etc. That works, and it does help to maximize your deduplication ratio, but the problem is the administrative overhead of that. A friend of mine always says "It's easy to divide your backups into four equal segments as long as your environment is completely static." Then, he waits for the laugh, because everyone knows that nobody's environment is totally static.
So, you can do it. But you might have bad deduplication because you won't be able to properly load balance across the four devices. For availability reasons you might want to load balance, but you can't. So you get the deduplication, but that comes with the administrative overhead of splitting the backups and the administrative overhead of possible failed backups. If you are not load balancing and one device goes down, all of the backups pointed at that device also go down.
Global deduplication is really about easing the administrative burden.
The only drawback I can see, is that global deduplication isn't available from the market leaders. Data Domain, and people like Data Domain for example, are certainly the leaders in the deduplication space. The concern with this is that it's a newer concept. It doesn't have the thousands of customers behind it like Data Domain.
I can't think of any real drawbacks, other than that. So, for now, if you are set on a company that doesn't offer global deduplication you are out of luck. But, they're working on it.
All of the source deduplication vendors have global deduplication today. Exagrid, FalconStor Software, IBM Corp., NEC Corp., and Sepaton all have global deduplication up to a certain number of nodes – some of them as few as two and others up to as many as 55 nodes.
This is an interesting question, and one that I've recently changed my answer to. I used to say that customers that back up 10 TB or more a night really need to consider global deduplication. If you are backing up 20, 40, 50 TB a night or more, you are forced to buy multiple data deduplication devices. If you are forced to buy multiple devices, global deduplication should be at the top of your list, right after backup and restore performance.
But, there are also companies out there that are much smaller than that, and I started thinking about those customers. People like to put their toes in the water. So, they look at the options in the market and say "I could buy this box that deduplicates at 900 MBps, but I don't need that. I only need 100 MBps." So, they buy a smaller device. And then, they grow.
Overall, it's hard to say exactly who needs global deduplication. Without a doubt, if you have to back up 50 TB a night, you need it.. But customers that are smaller and growing also should consider this feature.
This was first published in September 2009