Olivier Le Moal - Fotolia
Data archiving storage can be disk-, cloud- or tape-based, or a combination of the three options. Choosing the right approach is important because effective data archiving can ease backup issues.
Many data archive systems use scale-out architecture. They also have data protection schemes like erasure coding or replication that offer faster drive-rebuild times than traditional RAID. This allows for the use of very high-capacity drives -- 8 TB or larger. These systems often feature an object storage file system that is not affected by the number of files it stores. Despite these capabilities, a disk-only data archiving strategy can get expensive. Similar to scale-out NAS, the cost to power and cool these systems can become expensive as they scale.
The cloud is another option for data archiving storage, and many data archiving products connect directly to a cloud provider. The advantage of this method is that it removes hardware, power, cooling and footprint expenses from the data center. The latency of the cloud is also not typically an issue with archived data. While the data may need to be recovered, it often does not need to be recovered instantly. Additionally, most data archiving products that use the cloud keep a local copy on disk for a period prior to making data "cloud only."
The challenge with the cloud is ongoing costs. An organization is essentially paying for the same capacity repeatedly, and the periodic cost increases incrementally because the data set continues to grow. When those periodic payments are added up, it can far exceed the cost of buying the storage and keeping it on site. For many environments -- those under 50 TB -- the periodic cost is not much of an issue. But for medium to large data centers, the total cost can be significant.
Other articles in this series
In part 1 of this article, discover how an effective data archival system can ease backup problems.
Tape is the least-expensive form of storage, especially when power and cooling are factored in. Data archiving software that is tape-aware has made significant progress in abstracting tape management. Today, tape can appear to be merely an extension of the archive disk system. Users may not even know or care that they are accessing files from tape storage. In many cases, the tapes created are interchangeable between systems thanks to the Linear Tape File System.
A combination of these technologies is typically the best design. Using tape with a disk or cloud archive limits the cost of these platforms, while still allowing relatively rapid access to data.
Choose the right data archiving software
Software is the most important part of archive design. The good news is that many data archiving products are well integrated with backup software products. Some backup software products have even added archiving functionality. If you are getting started with archiving data, this may be a good place to start.
The archive software should abstract the user and administrator from the management of the back-end devices, essentially making a tape library or the cloud an extension of disk. It should present storage as an addressable file system (CIFS or NFS) or an object store. Policy management capabilities that manage the movement of data between the front-end disk cache and the tape or cloud repository should be provided. The archive software may also manage data protection by ensuring that multiple copies of data are created on two tapes, to two different cloud providers or a combination of both.
Beyond those core capabilities, some products may provide the ability to store user-defined segments of a file or object on disk, with the majority of it on near-line storage (tape or cloud). For formats that can support it, this allows for rapid response and maximum storage efficiencies.
An ideal example is video. A copy of the first 10 minutes of a video might always be cached on disk, but the entire file is stored on a secondary repository. This allows the initial playback of the video to occur while the rest of the video is loaded to the primary system. The user experiences instant access, but the data center achieves maximum data efficiency.
The net effect of implementing an archive is a massive reduction in backup storage and complexity. The backup process not only has to store less data, it does not have to process it. Also, data archiving storage options are typically far less expensive than backup storage options. The prerequisite work in creating a backup job is reduced and the back-end work, like metadata management, is greatly simplified.
Data storage archiving success depends on following best practices
Five signs your organization needs to adopt data archiving software
Learn archival storage strategies