While data deduplication usually leverages disk for storage, it shouldn't be confused with data mirroring or snapshot technologies. In most cases, data is written to disk using backup software, and therefore, it must be written back (restored) to a host in its native format before it can be accessed again. Although deduplication vendors say that disk is faster than tape, backing up to disk isn't data mirroring. In other words, if an application can tolerate little to no downtime, data deduplication isn't the best choice as a primary data protection target.
Replication is a must
Unless deduplicated data is also replicated offsite, it only offers limited disaster recovery capability. Some organizations choose to implement deduplication onsite for backup data but still use tape for offsite storage and disaster recovery. In many cases, data is no longer deduplicated once it's copied to tape. This will eventually be addressed when all backup applications are deduplication aware or capable. In the meantime, using tapes for offsite storage will undo the benefits of data reduction and disk-based backups, which brings recoverability back to the same level as traditional tape backups.
One of the advantages of data deduplication is the ability to replicate a reduced data set to a remote location without the same network bandwidth requirements as conventional replication. However, even with this reduced bandwidth requirement, the initial replication is still likely to take a significant amount of time or bandwidth because data reduction gains are usually not immediate and typically improve over time following multiple backups. In some cases, the first replication pass is done with the replication target installed locally to work around possible network bandwidth limitations and subsequently, the secondary deduplication appliance is sent offsite to resume replication of deduplicated data.
Any potential bandwidth limitation must be taken into consideration when planning for large restore operations typically associated with disaster recovery. It's also important to choose a suitable disaster recovery location for the remote replication target to avoid having to relocate the storage to accommodate large restores due to a lack of bandwidth or space.
Some deduplication technologies are referred to as "out-of-band" or "offline," which means data is first written to disk and then processed for deduplication before the final write. While this offers a certain performance advantage during the backup process, it creates a delay in the replication process that can affect the recovery point objective (RPO) for some data. In the event that a catastrophic failure affecting the primary storage target takes place before the data iss replicated offsite, this situation could result in data loss, forcing a restore from the last known good copy stored offsite.
About the author: Pierre Dorion is the Data Center Practice Director and a Senior Consultant with Long View Systems Inc. in Phoenix, AZ, specializing in the areas of business continuity and disaster recovery planning services, and corporate data protection. Over the past 10 years, he has focused primarily on the development of recovery strategies, IT resilience and recoverability, as well as data protection and availability engagements at the data center level. Pierre Dorion has been a guest speaker on the subject of data availability and IT resilience at a number of conferences, including Storage Decisions, ARMA, AFCOM and CIPS.
This was first published in May 2008