Recently, I came across a blog post asking whether it was time to "de-tape" my archive. The writer argued that for companies to find hidden value in all the data they've amassed, including archival data, they must deploy random access media -- disk or flash -- rather than sequential or serial access media -- aka tape. He supported this approach to the role of tape by recounting a couple of anonymous use cases that required fast access to archived data for a test-dev effort or to create custom sports video products.
Such use cases call into question the central thesis. One could argue data that needs to be referenced on a frequent basis, technically speaking, is no longer archival data. It has gone from cold -- to use IBM's terminology -- to warm or even hot. In other words, an archive platform, whether tape-based or not, isn't the proper platform for data that such a workload uses.
The question is not whether we should de-tape archives, but whether we should better define and classify which data is archival and which data is active.
The active archive?
This takes us to a modification of the original thesis, as the author introduced the idea of an active archive that conflates what used to be called secondary storage, using capacity disks, with tertiary storage, using tape and optical. Vendors began to blur the lines between secondary and tertiary in the backup storage market in the early 2000s, most notably with the introduction of disk arrays featuring deduplication algorithms designed to substitute disk for tape. Active archive, it seems, derives from the same sort of logic -- an effort to introduce another type of disk or flash platform that is somehow not quite primary or secondary storage, but also not quite as deep archive-ish or offline as the role of tape storage.
If you're like me, you aren't sure what that means. Like deduplicating virtual tape libraries, active archive platforms struck me as a contrivance for storage hardware vendors who didn't have any tape products on their play cards. I can't see what niche they fill or if there's a demand for an overpriced product that stores archival data in a less cost-effective manner than tape.
The author said Hadoop, Spark and Splunk are among the leading analytical tools for big data. They use object interfaces to access data, whether block or file. This is the basis of his argument for why object-oriented storage is the future. And with the role of tape, as well as most disk-based NAS products, fast becoming less appropriate as storage platforms for analytical databases to access, we must have disk- or flash-based object storage for active archiving going forward.
Reinforcing this perspective is the notion that data access protocols used by clouds such as Amazon Simple Storage Service (S3) are also optimized for object storage. Clouds, of course, are inevitable because analysts say so. So, if companies will eventually use cloud-based object storage repositories for their archives, why would they want to place data into a file-based tape storage archive at all?
Cloud still uses tape
Last I checked, the industrial farmers of the clouds deploy tape in a big way, mainly because it is the only way to store a data deluge estimated to exceed 100 zettabytes by 2025 and, also, because of limitations on bandwidth available to move data into the cloud or retrieve it once it's placed there in a timely manner. Tape provides a great means of "cloud seeding," where data is dumped to tape and then shipped to the cloud storage service provider for inclusion in a massive archival library.
As for tape being a poor media for hosting objects, one could make the opposite case. Buckets of objects are actually great candidates for storage using tape in conjunction with the Linear Tape File System. LTFS is a robust way to record long block files such as video, human genome data, and oil and gas exploration telemetry, while smaller files aren't its forte. As Spectra Logic's BlackPearl Converged Storage System and other technologies demonstrate, object buckets offer a great way to store large collections of smaller objects on LTFS tape.
Another use case is on-the-fly video editing. Future innovations from companies such as StorageDNA promise to make the data access efficiency of tape much greater than the current tape access metrics -- the maximum 45-millisecond speed for seeking the start of a file after mounting the cartridge isn't half bad for current archival (LTO) media.
Tape's still kicking
This kind of thinking has been with us since the late 1980s when, first disks, then RAID arrays, then SANs and then clouds were supposed to deal tape's death card. It didn't happen then, and it won't anytime soon.
Ask StarWind Software. If any company is riding the wave of enthusiasm around software-defined storage and virtual SANs, it's StarWind. Yet the company freely admits it's gaining a ton of market traction with its virtual tape library (VTL). It is a software-defined -- read flash and disk -- storage appliance, either hardware or virtual machine (VM), that emulates the role of tape libraries. The VTL actually supports tape, too. That is, it writes its contents out to a tape cartridge if the customer wants that, so it can be shipped to the Azure or Amazon Web Services clouds. StarWind also provides its VM for use in popular public clouds for VTL-to-VTL transfers from your shop to the service provider's.
StarWind and other VTL vendors, such as CA Technologies, Cristalink and QUADStor Systems, recognize the fact that there is every likelihood that your data, once sent to the cloud, will end up on tape anyway. So let's accept the fact that tape's not dead yet and recognize that an "all-of-the-above" strategy is what we're going to need for the next several years to deal with the data tsunami that is already heading our way.
Tape data storage is getting simpler and better
Why your data is likely to end up stored in a tape archive
Tape storage option is a key part of a strong backup strategy