Data replication can be a good option for SMBs who need fast restores and data recovery, and there are a number of low-cost data replication products in the market today. James Damoulakis, CTO of Glasshouse Technologies, discusses the pros and cons of low-cost replication, available options, and whether cloud or managed services are a viable replication option. His answers are also available as an MP3 below.
Table of contents:
>> Who really needs data replication?
>> What are the benefits of low-cost data replication?
>> What are the drawbacks of low-cost replication?
>> What vendors are offering low-cost replication products?
>> What are the pros and cons of using a managed service for replication?
The key use cases in terms of data replication are around people who are looking to have some capability for recoverability in as short a time as possible. We all have the need to protect our data, retrieve it, and make it available to users to be productive. But as the window shortens in terms of when and how quickly you need that data back, then you very quickly get into situations where replication is required.
So, many people start off by depending on backup and recovery for most of their data protection needs, but what they find is that when their requirements fall into less than one day -- anywhere from hours from up to a day, then replication solutions become a requirement for being able to accomplish that. The first thing to consider when you're deploying a replication solution is to understand what your requirements are. And what we're talking about is business objectives -- meaning the more you understand what the business impact of downtime is, you can make your decision more carefully. And when it comes to replication, that can have a great deal of impact on cost.
There are a number of different things in terms of what features and functions might be available. For example, some products will have features that are tied more specifically to applications. So if you have particular applications that you're trying to protect (Exchange, SQL, Sharepoint, etc.), some of the products actually have features and functions that are aware of those applications and therefore streamline the process. This makes it easier to manage and so on.
Some of the products in the same vein are tied to virtualization. So if you're a heavy virtualization user, you may be able to leverage one of those products more effectively in a VMware environment.
Other products have specific disaster recovery (DR) features. I often talk about the art of disaster recovery. There's a lot more to DR than having the technology and getting the data from point A to point B and recover the processes involved. So some of the products feature recovery automation and others even have the capability of doing non-destructive or non-disruptive types of testing so that you can validate your actual ability to recover if and when the time comes without impacting your production.
One of the things to consider is that replication itself protects against physical loss. So if a server goes down or a site goes down and you've replicated data over, it is protected. However it doesn't necessarily protect against logical data loss. If there were some sort of detected corruption that took place that you weren't aware of and it wasn't discovered for several hours, that corruption could very well have been replicated over and therefore your remote copy or replicated copy is also corrupt. What that really translates into is that replication is not a substitute for backups.
One way to avoid this is to use continuous data protection (CDP) -- meaning the copies are replicated, but CDP also maintains a history of the copies, allowing you to roll back to earlier versions for the specific purpose of protecting against this type of corruption. Another drawback is in terms of management scalability. In other words, if you have a few things that are key applications that you're replicating and protecting, you can probably do it with a number of products that are out on the market. Once you have a lot of things to protect and keep track of, management can become a challenge. Some products are able to mitigate that problem with central management consoles, but at some point, organizations find that they become large enough that they need to begin to consider storage array-based replication.
Most of the low-cost replication solutions fall under the category of host-based replication. One solution is DoubleTake, which offers a suite of products designed around Windows and VMware. Another is from CA. Their XOsoft product includes replication, application awareness, and CDP. Another one particularly focused on virtualization environment is VizionCore, their vReplicator product allows you to easily replicate VMs. Then there are a whole host of backup applications that also include replication abilities as options. A prominent example of that is CommVault.
Cloud backup is certainly an area of interest, and if the goal is simply to get a secondary copy of data offsite, then cloud offerings certainly are attractively priced options. The question is, how do you plan to use those remote copies of data? If cloud is being used for backup, recovery of individual files may be easy enough, but recovery of full volumes of data may take time depending on your bandwidth. Also, from a disaster recovery standpoint, you need to determine if you're going to need to copy everything back to a location to make it useful, and this could also take a long time. Some cloud vendors are making use of caching techniques to streamline and improve performance. For example, one new product that's in beta, from a new company called Cloud Array, claims to offer a virtual storage array that can connect into your favorite cloud service and offers built-in replication between local and cloud storage much in the same way that a host-based replication would offer in your private environment. The virtual array sits inside a virtual machine and VMware environment and provides much of the same functionality that you would see in a storage array. So there is an iSCSI interface out to hosts and on the back end you can have your own local storage, or you can connect to any number of cloud storage vendors.