New developments in tape technologies and applications will help breathe new life into this venerable, and still very useful, storage medium.
You may think tape has gone away -- and maybe some disk-backup vendors wish it was so -- but tape is actually thriving these days with steady advancements in bread-and-butter specifications like capacity and speed, plus new technologies that will expand tape into new applications.
The fundamental use cases and value propositions for magnetic tape haven’t changed much over the past five decades. Tape remains the primary media for backup and recovery (B/R), offsite archive and, by extension, disaster recovery (DR). Despite the occasional claim to the contrary, tape still offers the cheapest method for storing data for long periods of time. Even spin-down disk drives can’t match tape’s low total cost of storage. Of course, tape can’t match the data access time of even the slowest disk, so IT organizations still need to use both in the same environment. Recent developments in tape technology make this easier and even more attractive than ever.
Tape’s fate tied to disk
There’s no denying that disk-based backup has had a huge impact on the tape market. Small- to medium-sized companies may find backing up to disk and subsequent transfer to an offsite cloud much simpler and more cost-effective than using tape as an offsite media. Backup appliances that deduplicate and compress data can reduce the total amount of data replicated to a manageable size. In many cases, the recovery time is shortened as well.
Additionally, tape stackers and small robots have been problematic for remote-office backup for a long time. Having non-IT office staff manage tape changes and rotations has resulted in non-recoverable data far too often. Automatically transferring the data from the remote office to a professionally managed data center has allowed many organizations to eliminate remote-site tape altogether.
Larger organizations, in contrast, still use tape extensively. They may have implemented and expanded the use of disk-based backup to enable faster B/R times, but the sheer volume of data (think hundreds of TBs or even PBs) make network transfers impractical, even with deduplication. Moreover, even with cloud storage prices as low as $.10/GB per month, it’s still many times more than the one-time $.03/GB to store data on tape (the cost to vault a tape is negligible). Cloud storage has its place, but tape isn’t about to yield its low-cost value proposition for the foreseeable future.
Tape technology improvements
The areal density of disk continues to follow Moore’s Law, but tape is keeping pace. LTO-5, the current generation of the Ultrium LTO tape format, boasts 3 TB of capacity (compressed) per cartridge and a data transfer rate of 280 MBps (also compressed). Tape media reliability has improved 700x since 1999.
Although other tape technologies persist (e.g., Hewlett-Packard [HP] Co.’s DDS and Oracle Corp.’s StorageTek T10000), LTO is the dominant tape format for open systems computing environments. To date, more than 4 million LTO drives have been shipped and 100 million LTO tapes have been written. Many large organizations have tens of thousands of media elements under active management, giving tape a significant critical mass. And it’s growing: LTO experienced double-digit revenue growth in 2010.
The roadmap for LTO continues its historical improvement curve. LTO-6 is expected sometime in 2012 and has announced specs of 8 TB capacity (with 2.5:1 compression) and a 525 MBps transfer rate (compressed). The LTO roadmap currently extends out to generation 8 with planned specifications of 32 TB of capacity and a 1,180 MBps transfer rate.
For large organizations, automation products are as important as the tape technology itself; the drives are rarely used outside of an automated tape library. So advances in automation are equally as important to a robust tape solution. In many ways, tape automation is following the same path as storage arrays. That is, relatively commoditized components surrounded by sophisticated management and application software.
“We sometimes joke that our libraries are 90% software,” commented Molly Rector, vice president (VP) of marketing at tape library manufacturer Spectra Logic Corp. Given that most libraries today use LTO drives, other capabilities are needed to differentiate products to compete in the market. “Specsmanship around robot speed and arm movements are minor in the scheme of total job execution,” Rector explained. “A library’s ability to proactively detect errors, automatically fail over components and notify administrators of pending errors is far more important.” Spectra Logic libraries are also able to periodically verify the integrity of the managed media.
Overcoming tape’s limitations
Reliability improvements are a “must” if tape is going to continue to be a key recovery technology in the face of ever more stringent recovery time objectives (RTOs) and recovery point objectives (RPOs). The use cases for magnetic tape have remained constant for nearly 50 years largely because its limitations have also remained constant. To appreciate the magnitude of new tape technologies, it’s important to understand these limitations.
- Media degradation. Although tape media reliability has improved dramatically, best practice still dictates that media should be periodically tested and rewritten. That’s a daunting task when tens of thousands of media elements are archived. The failure of a critical data block, such as the index block, can render the entire tape unreadable.
- Drive compatibility. Although LTO maintains backwards-read compatibility, the archive requirement for the media may exceed the supported life of the drive and media. Occasionally, a tape written on one drive can’t be read on another drive. Recovering this data years later can be time consuming and costly.
- Lack of interoperability. LTO media can’t be read in non-LTO drives and vice versa. This has stymied many a data transfer effort.
- Proprietary tape formats. Although tar and cpio are industry-standard tape formats, they’re rarely used in their pure and interchangeable forms. B/R vendors use their own formats for efficiency reasons. Consequently, the tapes can only be read by that B/R application unless specifically written in tar or cpio.
- Backwards compatibility. Many IT users have a mix of media types in the vault due to technology and product generational changes. Being able to read all the various media types means maintaining not only legacy tape drives, but legacy servers, operating systems, drivers, interfaces and B/R versions. The possible permutations needed to read a seven-year-old tape makes recovery problematic at best, and very expensive and time consuming if even possible.
In addition to the low cost of storage, tape has the advantage of very high transfer rates. With LTO-5 streaming at 280 MBps, or roughly 2.2 Gbps, a single 10-drive library requires multiple 10 Gigabit Ethernet (GbE) pipes to keep up. This may not be a big deal within a data center, but it’s a serious issue for rapid recovery of large data repositories over a wide-area network (WAN). When restoring terabytes of data, tape libraries are the hands-down choice over the typical 1 Gbps link, or even multiple Gbps links.
Linear Tape File System (LTFS)
You don’t often find the words “tape” and “exciting” in the same sentence. But if there were an occasion sufficient to run those two words together, it would be the advent of LTFS. LTFS, originally developed by IBM and adopted by the LTO Consortium, is a self-describing file system that makes files on tape directly host-readable. The file system metadata tracks the media element, location of the tape and data location on tape. LTFS-enabled apps can request a tape load from a library, provided the library supports LTFS.
LTFS is arguably the most exciting tape development since the advent of cartridges and robots. LTFS and related device drivers are available as free downloads from numerous vendors. Because it’s a file system, its directory structure is directly readable. Users are no longer dependent upon third-party software to read the tape. They can use standard file operations on the files even though they reside on tape. For example, HP offers both StoreOpen Standalone for standalone tape drives in a MAC OS X environment as well as StoreOpen Automation. StoreOpen Automation presents the tape library and cartridges as a collection of folders; media movement is handled automatically by the application.
LTFS is targeted primarily at unstructured data, especially files that are unlikely to change. Files on disk may be modified, even when a contiguous block isn’t available, simply by using pointers. The notion of pointers skipping from one tape block to another to retrieve a complete record is currently antithetical to tape. Even if one were able to span media elements with a single file (which can’t be done), loading multiple tapes to retrieve a single file might not yield acceptable performance.
Because files are host accessible, LTFS does provide nearline storage. Examples of ideal candidates for LTFS are medical images and video files. Medical images, in particular, are never modified. Storing these large files benefits from the low cost of tape, yet they can be found and accessed directly by users. The time needed to load the file from tape will be longer than with disk, but shorter than if the file were stored offline.
LTFS in action
The ecosystem around LTFS is growing rapidly and to the point of ensuring its adoption. One organization helping to enable this ecosystem is the Active Archive Alliance. This vendor consortium is dedicated to developing open standards that allow LTFS to be deployed across multiple storage tiers. In essence, it lets LTFS create a single logical volume across both disk and tape subsystems.
Of course, having a volume that spans media types isn’t enough. Applications are needed to place the data on the appropriate tier and to move it based on user policies or usage profiles. Organizations that make storage management applications using Active Archive methodologies include Atempo Inc., FileTek Inc., Grau Data AG and QStar Technologies.
Oracle also offers solutions based on LTFS. “The idea of a 5 TB thumb drive is pretty cool,” quipped Tom Wultich, Oracle’s director of product management for tape. He’s referring to the StorageTek T10000C tape drive that’s LTFS-enabled and has a 5 TB capacity. “Users can easily move a tape from one LTFS system to another. An example would be the need to move a large media file from one system to another for editing,” Wultich said. “Transferring a multiterabyte file over the network may not be practical. Instead, one user can simply drag and drop the file to the tape and give it to the other user, who can then mount it just like a share or thumb drive.”
Crossroads System Inc.’s StrongBox product is all-in when it comes to leveraging LTFS for long-term storage archive. Robert Sims, Crossroads’ president and CEO, describes StrongBox as “a NAS head for tape.” The StrongBox appliance uses disk as front-end cache and supports multivendor tape connectivity on the back end. The product supports both CIFS and NFS.
StrongBox features are designed to provide the reliability necessary to ensure data recoverability for the long haul. Sims characterizes StrongBox as self-healing. By that, he means it supports dual copy to two different tapes, replication of tapes and failing over to a secondary copy if the first can’t be read. StrongBox monitors both drive and media error rates to detect degrading media. In the event of a media hard error, StrongBox will initiate a tape copy to create a replacement.
One issue for online file access using tape is the latency of a data read. CIFS and NFS will usually time out before the data can be mounted, accessed and retrieved from tape. StrongBox maintains a 512 KB buffer on disk to satisfy the latency while it retrieves the whole file from tape. StrongBox doesn’t presently support data modify or delete, although support for data delete is expected in 2012.
LTFS might also enable lower cost methods for common uses. For example, using tape as a write target in a dual-write scenario would effectively offer continuous data protection (CDP). Moreover, having two copies in different locations would assure data safety and a very granular RPO. It wouldn’t eliminate the need for B/R applications, which can facilitate point-in-time restores and the most recent version of the whole file system. However, for user self-service and the ability to retrieve specific file versions, LTFS may be just the solution.
R.I.P. tape? Not so fast
During the last five decades, the death of tape has been declared in at least four of them. Although tape wasn’t really threatened as an archive solution, LTFS brings a new dynamic to the needs of long-term data archive and access. With Active Archive, it also makes tape a viable tier 4 repository in the data center. Seamless access and low cost should extend tape’s lease on life for at least another decade.
BIO: Phil Goodwin is a storage consultant and freelance writer.