kras99 - Fotolia

Evaluate Weigh the pros and cons of technologies, products and projects you are considering.

Why arrays with erasure codes are not a backup substitute

Many shops struggle with slow backup and restore times when using traditional backup products. Erasure-coded arrays are no standby for traditional backup, but can help with protecting data.

Over the last several years, traditional backups have become increasingly challenging for some organizations, and IT pros have been phasing out legacy backup in favor of next-generation products. Technologies such as continuous data protection have proven to be effective, but what about erasure codes? Could erasure coding eventually eliminate the need for backups?

Erasure codes are a data protection algorithm similar to RAID, but with more flexibility. Consider how RAID 5 works. In a RAID 5 scheme, data is striped across each disk in the array set. In addition to block-level striping, each disk in the array set also contains parity data. The idea is that if one disk in the array were to fail, the remaining data on the other drives can be combined with the parity data to reconstruct the missing data. RAID 6 uses a second parity block on each disk and can survive the failure of two disks, but doing so requires twice the overhead of RAID 5.

Erasure coding works similarly to RAID 5 or RAID 6, except that it allows the data storage administrator to choose the required level of protection. For example, an erasure-coded array might be designed to survive the simultaneous failure of eight disks. The administrator defines the number of disks to be used and the number of disks that can fail without bringing down the array. An algorithm determines the amount of redundant data to be stored on each disk to achieve the administrator's requirements.

Although erasure codes can create a very high degree of redundancy, some aspects keep it from acting as a replacement for legacy backups.

Like RAID arrays, erasure-coded arrays are designed to provide operational fault tolerance. In other words, the array is designed to protect against disk failures, not act as a data backup. Even so, some believe that if a storage system is sufficiently redundant then traditional backups become unnecessary. After all, a backup is nothing more than a "redundant" copy of an organization's data.

Erasure codes vs. backup

Although erasure codes can create a very high degree of redundancy, some aspects keep erasure coding from acting as a replacement for legacy backups -- at least not by itself.

For starters, erasure-coded arrays are not designed to provide point-in-time recovery capabilities. The array is designed to guard against disk failure, not to perform data recovery. Since organizations need the ability to recover lost or corrupted files, there needs to be a mechanism for point-in-time recovery.

One possible option would be to store production data in virtual hard disks on the erasure-coded array, and then use application-aware snapshots to provide point-in-time recovery. Creating a snapshot does not actually create a copy of the data. However, because an erasure-coded array can have a high level of built-in redundancy, snapshots could work. This is similar to how snapshot is used in conjunction with backup software today.

Another problem with using erasure-coded arrays as a backup substitute is that the arrays are not fully immune to hardware failures. Sure, such an array can survive multiple, simultaneous disk failures, but what happens if the disk controller were to fail? Such a failure could corrupt the entire array.

If an organization is seriously considering using an erasure-coded array as a substitute for traditional backups, then it will also need a way to guard against the array becoming a single point of failure. One option is to replicate the array's contents to a secondary array. This allows the organization to have a copy of its data that is isolated from the primary storage array and that will not be impacted by an array-level failure.

About the author: 
Brien M. Posey, MCSE, has received Microsoft's MVP award for Exchange Server, Windows Server and Internet Information Server. Brien has served as CIO for a nationwide chain of hospitals and has been responsible for the Department of Information Management at Fort Knox. Visit Brien's personal website.

Next Steps

Erasure codes can cut down on backup costs

Deciding between erasure coding and replication use in an organization

Erasure code use in a post-RAID environment

What to consider when using flash backup for your company

Dig Deeper on Backup and recovery software

Join the conversation


Send me notifications when other members comment.

Please create a username to comment.

Have you tried to use an erasure-coded array as a substitute for traditional backups? If so, how did it work for you?
I rely on rely on erasure coding to as an alternative to traditional backup when dealing with massive data sets. Erasure coding can break data into segments, expand and encode it with redundant data pieces and store it in a set of different locations. Storage options include disks, storage nodes, or geographic areas, and these protect against data loss in the event of media failure.
If a user deletes a critical file then it is gone for good unless there is a backup of it somewhere. If there is a fire or flood in the server room then all the data is lost unless there is a backup of the data stored off-site. Redundant drives are never a substitute for off-site backups.
"some believe that if a storage system is sufficiently redundant then traditional backups become unnecessary."

That doesn't help you if something physical happens to the storage system, such as a fire, flood, or other natural disaster. You really need a backup in another physical location.
The primary issue with RAID is that it was designed when data storage and disks were much smaller. With greater capacity disks, RAID loses effectiveness.
One of the benefits of erasure coding over traditional RAID is that you can stretch it across a wide geographical area. That means that in addition to getting RAID-like data protection for local disk failures, you also get more comprehensive data protection for when a full node (or geo location) fails.