Data deduplication disaster recovery considerations

Data deduplication disaster recovery considerations

Data deduplication is no longer just a cool technology, and it has made its way into many data centers around the world. In many instances, it's also replacing tape as the media of choice for backups. This evolution

    Requires Free Membership to View

    When you register for SearchDataBackup.com, you’ll also receive targeted emails from my team of award-winning editorial writers. Because your job never seems to get any easier, it’s our goal to keep you up-to-date on the latest backup tips, trends and technologies that will help you get the job done.

    Rich Castagna, Editorial Director

    By submitting your registration information to SearchDataBackup.com you agree to receive email communications from TechTarget and TechTarget partners. We encourage you to read our Privacy Policy which contains important disclosures about how we collect and use your registration and other information. If you reside outside of the United States, by submitting this registration information you consent to having your personal data transferred to and processed in the United States. Your use of SearchDataBackup.com is governed by our Terms of Use. You may contact us at webmaster@TechTarget.com.

must definitely be taken into consideration when developing disaster recovery strategies.

Disk-based backup

While data deduplication usually leverages disk for storage, it shouldn't be confused with data mirroring or snapshot technologies. In most cases, data is written to disk using backup software, and therefore, it must be written back (restored) to a host in its native format before it can be accessed again. Although deduplication vendors say that disk is faster than tape, backing up to disk isn't data mirroring. In other words, if an application can tolerate little to no downtime, data deduplication isn't the best choice as a primary data protection target.

Replication is a must

Unless deduplicated data is also replicated offsite, it only offers limited disaster recovery capability. Some organizations choose to implement deduplication onsite for backup data but still use tape for offsite storage and disaster recovery. In many cases, data is no longer deduplicated once it's copied to tape. This will eventually be addressed when all backup applications are deduplication aware or capable. In the meantime, using tapes for offsite storage will undo the benefits of data reduction and disk-based backups, which brings recoverability back to the same level as traditional tape backups.

Network bandwidth

One of the advantages of data deduplication is the ability to replicate a reduced data set to a remote location without the same network bandwidth requirements as conventional replication. However, even with this reduced bandwidth requirement, the initial replication is still likely to take a significant amount of time or bandwidth because data reduction gains are usually not immediate and typically improve over time following multiple backups. In some cases, the first replication pass is done with the replication target installed locally to work around possible network bandwidth limitations and subsequently, the secondary deduplication appliance is sent offsite to resume replication of deduplicated data.

Any potential bandwidth limitation must be taken into consideration when planning for large restore operations typically associated with disaster recovery. It's also important to choose a suitable disaster recovery location for the remote replication target to avoid having to relocate the storage to accommodate large restores due to a lack of bandwidth or space.

Deduplication performance

Some deduplication technologies are referred to as "out-of-band" or "offline," which means data is first written to disk and then processed for deduplication before the final write. While this offers a certain performance advantage during the backup process, it creates a delay in the replication process that can affect the recovery point objective (RPO) for some data. In the event that a catastrophic failure affecting the primary storage target takes place before the data iss replicated offsite, this situation could result in data loss, forcing a restore from the last known good copy stored offsite.

About the author: Pierre Dorion is the Data Center Practice Director and a Senior Consultant with Long View Systems Inc. in Phoenix, AZ, specializing in the areas of business continuity and disaster recovery planning services, and corporate data protection. Over the past 10 years, he has focused primarily on the development of recovery strategies, IT resilience and recoverability, as well as data protection and availability engagements at the data center level. Pierre Dorion has been a guest speaker on the subject of data availability and IT resilience at a number of conferences, including Storage Decisions, ARMA, AFCOM and CIPS.


This was first published in May 2008

Disclaimer: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.