Toigo: Erasure coding as prevention for bit rotDate: Jan 11, 2013
With the growth of data storage capacity, the problems that come from "bit rot" -- the phenomenon of single-bit errors appearing in data -- have also become a greater headache for storage professionals. In this Storage Decisions video, disaster recovery expert Jon Toigo, founder of Toigo Partners International, discusses erasure coding, which can help reduce the risk of data errors.
Toigo noted that while the frequency of bit rot, also known as silent corruption, in data is low, as the amount of storage grows so do the chances for potential errors.
"[Bit rot] goes by a lot of names … things that corrupt data at the time that it is written, or at the location where it is sitting," said Toigo. "Now, you may not think it is a big deal, it only happens one in 1016 -- that's the frequency in which bit rot occurs -- in a SAS or Fibre Channel drive. One in 90 SATA drives have silent corruption, which is one error in 67 TB. So depending on the amount of data or storage you have fielded, you may have several drives that are bad."
Toigo cited a study that he said suggested that between 5 to 10% of storage system failures that led to "non-recoverable events" were linked to bit rot. But some current methods of data protection have their own drawbacks, such as the data integrity field (DIF) standard that Toigo said most array controllers do not support. Other options, such as file-system-level checksum validation, can slow down the storage array.
Toigo suggested that erasure coding could be an alternative, which essentially breaks down data into object fragments that can be stored in multiple locations, but can be used to reconstruct data if needed.
"Erasure coding takes the data you're writing -- the application data -- and passes it through a parsing engine and creates objects. These are algorithmic representations of the data. And it breaks those objects into fragments and stores those fragments [in other locations]," said Toigo. "You can take one or two or three of these objects on whatever disk where they exist … expose them to a reconstruction algorithm, and you can rebuild what the data was that was in that object. Instead of using RAID, you're chopping the objects into multiple pieces and storing them all over the complex of storage that you have. It's sort of a shell game approach to protecting your data."
He noted that Amplidata is currently using a variant of this technique in the company's BitSpread technology for use with cloud storage. "It's basically the ability to reconstruct your data from available bits, wherever they happen to reside in the cloud," said Toigo.