Remote replication and data deduplication technology are frequently used together in order to back up branch offices to a central site. W. Curtis Preston, executive editor of the Storage Media Group and independent backup expert, answers common questions about using remote replication and data deduplication together. Learn about the pros and cons of using remote replication and data dedupe, and what types of products are available in this Q&A.
Curtis' answers are also available as an MP3 below.
>> What are the benefits of using remote replication and data deduplication together?
>> Is replication used with source deduplication or target deduplication?
>> What are the drawbacks of remote replication and data deduplication together?
>> Do products manage replication and deduplication together?
>> Who are the major players in this space?
If we go back in time a little bit, before the time of data deduplication, replication for most people only worked if you were replicating the source data, the primary database. You would use Symmetrix Remote Data Facility (SRDF), either hardware- or software-based replication that would replicate the database you were protecting. And, that worked fine.
Then people started to want to replicate their backups of that database. That was a real problem. When you go to make a full backup, especially on a database that you will be backing up on a regular basis you are creating this massive chunk of data, very little of which has changed. It's just not feasible to replicate all of that. Even the incremental backups add up to significantly more than if you were replicating the primary data.
With data deduplication, say you were replicating a terabyte of data. You might have a gigabyte of changes since your last full backup. Because deduplication only backs up new information, it allows you to essentially replicate just that gigabyte of information, dramatically reducing the amount of data you back up.
Well, source deduplication by itself is not necessarily deduplication. Its main use is to back up remote data with bad bandwidth to your central site. But, you could also use source deduplication to back up a medium-sized remote site with, say, half a dozen servers. You could use source deduplication to back up the servers to one server at the remote site, and then replicate that data to the central site.
That architecture is exactly the same as with a target-based deduplication system. You put the target system at the remote site and use whatever you want to back up to that target deduplication appliance which then is replicated back to a larger appliance back at the remote site.
Most people are using source deduplication as a way to back up remote systems directly to the central office. They don't have a local restore appliance at the remote office. The target deduplication scenario is much more common. Especially with EMC/Data Domain. Many Data Domain users have a second system in place that they replicate to.
You need to make sure that the replicated system has the same recovery behavior as the primary system. Is your recovery speed at the recovery site the same as the recovery speed at the primary site? That's something you really want to test for.
Also, it's important to remember that replicating a backup is not the same as replicating primary data. When you replicate primary data to a remote site, that information is there in a format that is immediately accessible to the application. In this scenario, what you have is not a copy but a backup. So that means you have to do a restore. It's not the same as just restarting Oracle with the replicated data. It doesn't provide the same recovery point objective (RPO) that primary replication does.
It depends on the product, but at this point any decent data deduplication system will manage it for you. It will be built into the process and you should be able to set policies around when you want to schedule replication.
Pretty much any of the target deduplication players has replication built into their appliance with one exception: NetApp. They have data deduplication in their virtual tape library (VTL) but have not integrated replication.