By Dave Raffo
The combination of data deduplication and data replication was among the major reasons Community Health Centers Alliance in St. Petersburg, Fla., switched from a managed service and installed a Spectra Logic virtual tape library (VTL) in late 2008.
"The biggest advantage in going to a VTL was the fact it could replicate data offsite," said IT system administrator David Henson. "We wanted to be able to use replication so when we get to a hot site or cold site for DR, we don't have to ship tape out there. And we needed to look at dedupe because we have a lot of duplicated data."
Remote data backup and recovery fits in with disaster recovery in two ways. Organizations with small offices outside the data center need to centralize backups, and large firms want to send copies of business critical data to facilities outside their main data centers. Either way, it's crucial to move up-to-date data to or from remote sites to the data center quickly and easily. This requires technologies such as replication, data deduplication and WAN optimization, or the entire process can be outsourced to cloud-based services.
In this tutorial on data backup and recovery for remote sites, learn about options for remote backup including using in-house software, outsourcing to the cloud, WAN optimization, data replication and disaster recovery, and data deduplication and continuous data protection (CDP).
REMOTE DATA BACKUP AND RECOVERY TECHNOLOGY TUTORIAL TABLE OF CONTENTS
Moving data from remote sites to a central location makes it easier to manage and protect, and can alleviate remote sites from having to manage tape backups. This is done largely through backup applications that tie into the data center or WAN optimization appliances that let organizations make better use of bandwidth from remote or branch offices to centralized data centers.
Data backup software packages such as Asigra Inc. 9, EMC Corp. Avamar and Symantec Corp. Veritas NetBackup PureDisk are aimed specifically at remote sites and use data deduplication to reduce the amount of data that needs to go over the wire.
WAN optimization devices such as those from Cisco Systems, Blue Coat Systems, Riverbed Technology and Silver Peak Systems are placed on remote and branch offices and also reduce the amount of data while sending it over the wide area network.
Although disk is increasingly used in backup, tape still plays a role for organizations that store data offsite for disaster recovery because of its cost and portability. Even organizations that employ disk backup at remote sites will also keep data on tape as another layer of protection.
Organizations are starting to turn to the cloud -- managed services -- to back up data for disaster recovery. This is especially true of smaller companies that have small or no dedicated IT staffs, but larger firms are also turning to the cloud for DR in these economic times.
Using cloud services obviously lets organizations keep data offsite in case of a disaster, which can range from a hurricane, earthquake or flood to a failed disk drive or even user error. Cloud-based backup can also save organizations money because they don't have to purchase technology to do it themselves or use staff resources to manage the backups and restores.
Cloud backup can come in the form of Software-as-a-Service (SaaS) where lightweight agents on systems that get backed up send data to the cloud, where it is hosted at a central location and accessed through the Web. It is also available as a hybrid model where the customer maintains backup software on premise and backs up to the service provider's data center.
However, there are legitimate concerns and caution is advised when turning to the cloud for backup and disaster recovery. Organizations looking to back up to the cloud want to know their provider has state-of-the-art technology such as data deduplication and server virtualization, can secure the data (usually through encryption) and has multi-tenant capabilities such as firewalls between different customers' data stored on its systems. Another issue involves what happens if a provider loses customer data or goes out of business.
Data replication is the process of copying data for redundancy as it is written. For disaster recovery, data is replicated to a remote site to provide another copy outside of the data center.
There are two types of replication -- synchronous and asynchronous. Synchronous replication continuously updates the replicated data and waits until the write has been copied to the target site before acknowledging the write to the application. Because of the latency associated with synchronous replication, it is usually limited to local replication. Asynchronous replication tells the application that the data has been stored while it is copying it to another site and is used for sites more than 30 miles apart.
Major storage vendors offer array-based replication, or, more accurately, replication software that runs on storage controllers in the main data center and remote sites.
The most popular array-based replication for remote sites include EMC Symmetrix Remote Data Facility (SRDF), EMC MirrorView for Clariion systems, Hitachi Data Systems (HDS) Universal Replicator software for asynchronous replication, Hewlett-Packard StorageWorks XP Continuous Access and Continuous Access EVA, IBM Corp. Global Mirror, NetApp SnapMirror for block-based replication, and NetApp SnapVault for file-based replication.
The main drawback of array-based replication is lack of heterogeneous support. Except for HDS, which supports other vendors' storage with its USP V systems, vendors' software only allows replication between like arrays. EMC won't even allow replication between high-end Symmetrix and midrange Clariion systems.
Organizations can also target specific applications with host- or application-based replication. These methods include database log shipping, volume- or file-based replication and application-specific replication tools.
Data deduplication -- the hottest technology in backup today -- is often combined with replication for DR. Deduplication reduces the amount of data that gets replicated and lowers the bandwidth requirement to copy data offsite.
It's no coincidence that leading deduplication software vendors EMC/Data Domain, Quantum Corp., IBM/Diligent, FalconStor Software and Sepaton Inc. have added or upgraded their replication capabilities this year.
"Replication without data deduplication may have worked in the past, but it is absolutely not enough for 2009," Taneja Group analyst Arun Taneja said. "Having it is now a prerequisite. What the efficiency is and how much better than somebody else's replication with deduplication is another question, but the baseline product is absolutely needed."
There are some drawbacks to the dedupe/replication combination. Inline deduplication that takes place while data is being written to disk can impact backup performance and post-process dedupe that takes place after the backup completes can delay replication. Still, data that is deduplicated and replicated offsite can be recovered much faster than data backed up to tapes and stored offsite for disaster recovery.
Continuous data protection (CDP) is finding its way into the disaster recovery process, especially when it can be combined with replication or failover in products such as EMC RecoverPoint, CA XOsoft, Double-Take Software Inc. and InMage Systems Inc. CDP is usually less expensive than array-based replication and snapshots, making it more attractive to smaller companies.
Continuous data protection captures every bit of info changed and lets users roll back to a specific recovery point in case of lost or corrupted data. CDP, which can protect files or applications, is appealing to organizations with strict recovery time objectives.
Dig deeper on Remote data protection