Replication is increasingly becoming a critical business need for many organizations. In this interview, Executive Editor and Independent Backup Expert W. Curtis Preston discusses synchronous and asynchronous replication, deduplication and backup management best practices.
Table of contents:
>>Synchronous and asynchronous replication
>>Asynchronous replication and "point-in-time replication"
>>Management best practices for ensuring effective replication
>>Replication and data deduplication
Replication is the process of copying data from one host to another host in a block-level, incremental fashion. So as replication is typically done at either a file level or a volume level, as things change on that volume, the blocks that are changed on the source are then immediately replicated to the target.
Synchronous replication technology does not acknowledge the write from the primary application until the block has been replicated to the target site. Asynchronous replication then acknowledges the write and then replicates that block over time.
Each has advantages and disadvantages in different scenarios. Synchronous replication has the advantage of being continually up to date at the target site. You always know that the data at the target site is as current as the data at the source site. The challenge is that since it won't acknowledge the write until it knows that the block has been replicated, the length of time it takes to get that block to the target can change the performance of the front-end application. So typically, synchronization is only done within a data center or at a very short distance -- less than 50 miles, or even 20 miles away. There are some technologies that are allowing people to go out further than that, but they are newer technologies.
The advantage of asynchronous replication is that no matter what the bandwidth or latency is, it's not going to impact the performance of the primary application. The downside of asynchronous is that it can get out of synch with the primary application and can actually get so out of synch that it can never catch up. Some products have the ability to go into special modes to try and catch up, but if you don't have enough bandwidth or have too much latency, you can actually get so far behind that you won't meet your recovery point objective (RPO), which is the whole point of replication.
So one is always up to date, but can impact your performance and the other never impacts your performance, but can become out of date pretty quickly.
Technically, point-in-time replication is a subset of one of the ways to do asynchronous replication, in that since asynchronous just means that you're not forcing the write to be acknowledged, before you acknowledge the write back to the primary application. What point-in-time replication means is that you take a snapshot at a certain time, typically once an hour.
Then your replication product looks at the bytes that have been changed in between the last snapshot and the current snapshot and then replicates those bytes necessary to create those points in time at the replication destination. So some of them can continually replicate and then will take a snapshot at the source site. Then it will just replicate that status to the other side. But the big difference is that with a point-in-time replication system, you have one or many points-in-time to go back to if one you have with corruption.
With asynchronous replication, depending on how to up to date you are, you're continually copying over everything, including the corruption. If you were to do something like drop a table, you could potentially overwrite the target with that corruption.
It's all about testing. A lot of people get replication because they assume they're going to have enough bandwidth, that replication uses very little bandwidth and that they just put replication in place and magic happens. That is the surest way for a disaster. It assumes that all replication products would replicate the same amount of data. The first thing you need to realize is that different products replicate differently and send different amounts of data, and behave differently with different conditions. Test products in a lab scenario where bandwidth and latency is not an issue, but record the bandwidth that the replication is using.
The next thing you can do is use a WAN simulator where you can actually simulate both latency and lower bandwidth of different types. You can then specify that this thing should look like it's 1,000 or 10,000 miles away and then see how that application performs under that lab scenario. The reason why you want to do that, as opposed to just using a WAN, is that it helps keep variables out the equation so that the only variable is the software that you're testing. If you can simulate the changes, bandwidth and latency so that they are always the same, you can see how the different products perform. I think that if you test the products prior to buying them, you will have a much better experience than most people do.
Then the next thing to do is monitor. Replication does work so silently that a lot of people aren't checking up on it. It's just like backups in that the surest way to have it fail on you is to not watch it.
They are related in that with data deduplication, you can replicate things that you have not been able to before, such as regular backups. Replication historically has been at the volume level or the file level. You're replicating a volume, file or database at its primary location. Then if you were to back up that database to a disk array, then replicate that disk array without deduplication, you would be replicating a significantly larger amount of data than if you were replicating it from the source.
Dedupe allows you to back up that data to a disk using dedupe methodologies, and then because dedupe actually eliminates redundant blocks, it can then allow you to replicate that backup to another location. This was something that was only possible in the smallest environments up until now.
W. Curtis Preston (a.k.a. "Mr. Backup"), Executive Editor and Independent Backup Expert, has been singularly focused on data backup and recovery for more than 15 years. From starting as a backup admin at a $35 billion dollar credit card company to being one of the most sought-after consultants, writers and speakers in this space, it's hard to find someone more focused on recovering lost data. He is the webmaster of BackupCentral.com, the author of hundreds of articles, and the books "Backup and Recovery" and "Using SANs and NAS."
This was first published in March 2009