Tape storage use is on the decline, just as it has been for years. However, tape media offers a number of unique qualities that make it ideal for certain applications, such as archiving and disaster recovery. It can be truly offline, requires no electricity, and is easily transportable off-site. And there have been a number of developments in the tape market in the past few years aimed at making tape more useful for archiving. In this 2012 Chicago Storage Decisions TechTalk interview, analyst Jon Toigo discusses tape storage today.
What role does tape media play in today's data center, and how has that role changed from a few years ago?
Jon Toigo: Well, I put everything into the context of what I'm calling "The Infrastruggle." It's basically a fight among different vendors offering different types of storage for your IT budget.
What we confront right now [is] the interesting [combination] of many trends. First, a proliferation of data like nothing we've seen before. Second, a complete mismanagement of the data or failure to manage data at all, which means we've got a huge junk drawer that contains data that's never referenced and data that gets referenced a lot. Third, we confront energy cost spikes, a 23.2% increase in the cost-per-kilowatt hour for energy in the United States over the past two years. And in many areas of the country, particularly the New England Corridor, Northern and Southern California, a saturation of the distribution grid for power. So, I have clients -- pharmaceutical clients, for example -- that say they can't get another lick of electricity dropped into their data center. And now that they have a requirement to store all clinical test trial data in near-online state; that's a big problem.
Basically, when you look at storage today, you look at two different sets of requirements. You've got performance storage, which needs to be optimized for fast I/O to enable concurrent access by applications and end users, and you've got storage for data that is hardly ever accessed, data you have to hold on to for a protracted period of time, whether it's for regulatory reasons or it's just business-critical data that isn't accessed very frequently. What you're looking for there is a solution that will optimize capacity per watt, not necessarily focused on speeds and feeds, as it is on capacity.
I see that as a huge opportunity for tape, because you can build a tape library today -- one of those stellar libraries that are out there comes from Spectra Logic, for example [that offers] 196 PB of storage on a couple of raised floor tiles, consuming less power than three or four light bulbs. That is a lot of storage capacity. You front-end that with Linear Tape File System, or LTFS, and you basically end up with NAS on steroids to store all those rarely, infrequently accessed files. I see that as a huge opportunity for tape to move back into a role that perhaps it hasn't played since the '70s, when it was the primary storage media behind a mainframe.
How does tape compare to disk and other media in terms of reliability and durability?
Toigo: The whole myth around tape is a statement that was supposedly said by Gartner, which has been disavowed by Gartner. They claim they don't even have a copy of the document where they said it. I'm reasonably sure they did say it, though. They said, "One in 10 tapes fails on restore." That's back when tape was primarily used as a backup medium. It was nonsense when they said it because the only way to have one in 10 tapes fail on restore is if someone is using your tapes as Frisbees when you're not looking. It just doesn't happen.
There has been what most analysts claim somewhere around 700 to 800% improvement in the resiliency of tape over the past 10 years. The minimum, with proper environmental conditions, tape will last you up to 70 years, with the data that's stored on it. In less than favorable environmental conditions, you can be guaranteed about 30 years of operational life of tape.
Look at what's happening with disk. Disk, we used to say, we get five years of normal use. And now statistics have come in from Carnegie Mellon and Google suggesting that the reality of disk failure is about 1,500 times greater than what was advertised by the vendors. So, if anything, I think we need to readjust our whole perception of the viability of each medium. There are only two kinds of disk drives, those that have failed and those that are going to.
LTO is the dominant tape media format, with a huge market share. Could its dominance potentially stifle tape technology development?
Toigo: No, not at all. In fact, LTO is a cartridge specification. It says the tape will be X feet long, or meters long, and it will fit into a cartridge of X dimensions. That was a standard that was implemented by primary industry players, actually, as an alternative to DLT, which was the dominant tape technology back about 10 years ago, from Quantum.
The LTO tape cartridge is very viable. In fact, it's what's inside that counts and Fujifilm demonstrated with IBM, a year ago this January, a new coating called barium ferrite for tape. Generally, when you send a mild static charge through it, it stands all the bits up on end. So it's doing perpendicular magnetic recording. You might be familiar with that from SATA drives. That's what got us to terabyte-sized drives and 4 terabyte [TB] drives. Ultimately, I think that technology peaks out at about 8 TB. We're looking at a tape cartridge, within 18 months, that has 32 TB of capacity, in LTO tape. That's pretty good.
I think that tape innovation continues today. There are also enterprise formats that don't comply with the LTO cartridge format. The T10000 cartridge at Oracle is an example. [These tape formats are] very good technologies, they just happen to isolate your choices to one vendor's drive. I think the LTO format has gained as much market share as it has because it works with everybody's drive.
LTFS seems to be a key technology for expanding tape usage. Why is this?
Toigo: Linear Tape File System was an answer to a long-standing need to be able to apply a file system to tape-based data. Now, it's a rudimentary file system. I don't think most people understand [that] it doesn't give you all the bells and whistles of NTFS and a Windows environment or ZFS in a Linux environment. What it does do is it interfaces to the tape library and it leverages partitions on tapes, which were introduced in the LTO cartridge format in Generation Five of LTO. The partition is used basically to store metadata, data about the data that is stored there, including the start points and stop points of each file. [This means] you can rapidly scan through a tape, inventory what's on it, in terms of data, and record the start locations. Then, when you need to retrieve it, you can speed up the tape to get to that location, find the start point of that file, and send it out.
The way that LTFS inventories the files on tape is it creates file folders that have cryptic names on them that are identical to the barcode that's on the back of the tape. However, all the files are named by their real names. So, it's just like any other file system. It's just that barcode thing tends to be user unfriendly. That means you have to perform a search whenever you're looking for data. There is no intuitively obvious, "This is all accounting data from 1979," or "These are all the files that pertain to this lawsuit." You don't have that kind of organizational scheme. That's the reason why the file system vendors are now linking to open books that are now available in LTFS, in order to rationalize that system and to leverage what it's finding out about the tape and to make it more user-accessible. That's where tape NAS comes in because you've got a file system that's overlaid on top of LTFS.
Another approach to this is what the broadcast media industry is doing, which is, they're interested in tapes from the standpoint of "This is the home run that was hit in the ninth inning of the Chicago Cubs game, on this date." They have what they call "media asset management databases" that store all the different clips. The biggest inroads over the last two years have been the merger of these media asset managers and LTFS technology. That's already fully baked. That's ready to go because they were the early adopters of LTFS.
I was just at an IBM conference two weeks ago. Listening to their sessions, it appears pretty obvious they want to map LTFS to their GPFS, or Global Parallel File System, that enables tiering. So all their hardware products, whether it's the disk arrays or the tape libraries, will all be able to seamlessly exchange data, and you can migrate data from fast disk to slower disk to tape, or whatever methodology you want to use. That requires those additional hooks. You need a file system sitting on top of it.
Right now, I can put together a great LTFS solution. I can do it with a red-hot server if I wanted. However, I've seen third-party products like the Crossroads Systems Strongbox, which already has done all the work for me. In fact, they've got the cobble of the file system and LTFS. So, I can get a human-readable NAS head already created for me, and it also performs other functions, like read/verify on the tapes after you've written them.
This is all good. It's all pointing high and to the right for the future of tape.
LTO-6 can store up to 8 TB, and IBM and Oracle have tape media with similar capacity. Is there any danger to having such high tape capacities?
Toigo: Do you consider data at greater risk on a disk drive because the disk drive has higher capacity? Probably not. The rebuild time on a disk is directly proportional to how much data you store on it. You know, we've got disks that are coming down the pike based on Toshiba technology with bit pattern media. They finally figured out how to sputter-coat a drive surface and give it mesas and valleys at predictable intervals, so I can store even more data because the current generation SATA drives doing perpendicular magnetic recording top out at about 8 TB. That's about as much as you can get on a three-and-a-half-inch spindle.
There was just a demonstration of a drive using Toshiba's new sputter-coating technology for bit-pattern media that delivers a 2 ½-inch drive with 40 TB of capacity. So, for a small company, you may need two drives, one to store all your data and the other one to take in off-site. That's kind of cool. That's also the reason why you're going to need 32 TB tape capacities, and greater, because now these very high-capacity drives are going to be in widespread availability very soon.
What steps should admins take to optimize tape performance?
Toigo: Well, first of all, it depends on what you're using tape for. Tape, basically, has three different purposes. You can use it for backup and a lot of people have soured on backup to tape. They've decided to do disk everywhere. I think it's a huge mistake, but it's what they do and, usually, that's related to difficulties that they have delivering data to the tape device. If you just aggregate a whole bunch of targets, you know, "This one has a couple of gigabytes," "This one has a terabyte of data you want to back up," etc., and you just merge them all together, it creates what is known as a super strain, or braid. Then the shorter jobs complete and then the braid unravels and you're not operating tape at its rated speed anymore and it starts shoeshining and backhitching. That will create issues. That will make a backup that was supposed to take two hours take 20.
The best approach to dealing with that is to create the jobs as a series of jobs that are grouped together by the length of job. Then, you don't have this unraveling, and you operate the tape at its rated speed. People have lost the skills required to do effective backups. Instead, they're outsourcing their brain to brain-dead backup software. You got to do it right, and that's how you optimize tape.
There are also differences in the speeds and feeds of things. There are differences in the interconnect speed that's coming off Fibre Channel disks versus the interconnect speed that's coming off of iSCSI disks, versus backups that you do off of a NAS device that's just connected with a standard Ethernet connection. All these things interfere with your ability to do efficacious backup with tape, but they're all solvable and we gave up on solving them because the disk array vendor said, "Oh, it's a lot easier to just do it on disk. OK?" You even have the people who are doing the deduplicating VTLs, and they bring in the virtual tape library, initially, to solve the problem of backup and then they decide, "Get rid of the tape altogether. We're just going to replicate between VTLs." That makes no sense to me at all.
I mean, one prominent VTL product that's out there right now charges you $410,000 for a box of 30 TB of SATA drives. They use the Aladdin sale, you know, from Disney's Aladdin: "Phenomenal cosmic power. Itty-bitty living space." You know, the idea that you're going to get a 70-to-1 reduction ratio out of this deduplicating appliance. Nobody ever gets that, but they lie and we believe it, and we buy it because it's a shiny new thing.
But backup is only one thing. Tape can also be used as archive media and it can also be used as an active platform for tape NAS and those are new stories, for the most part, for tape. They're new old stories. This is the way we did in mainframes when I first started in this business.
You know, you had tiers of storage … You started with memory in the system and that was a rare resource, or you had to get off of that and onto DASD, which were direct-access disk devices but, in those days … You remember Tom Hanks in Apollo 13? He points over his shoulder to the vehicle assembly building and says, "We can store over 8 K of data in that building." You had to build a gigantic building just to house these refrigerator-sized disk platforms, and then tape provided your mass storage media. And, we were very disciplined about how we moved data between those three tiers.
Today, we've completely lost any sense of discipline. I think it starts when kids are young and they're downloading the Internet to their computer without respecting the fact that Dad has to buy all the disks. They think that it's free. Most users are like that. You can abuse the storage, but you're going to pay high prices for it.
What are some best practices for tape encryption, and what options exist?
Toigo: OK. Well, tape encryption. Well, first of all, you scratch your head when you listen to the requirements that have come out and they're in state and federal law, in some cases, requiring the encryption of tape when [used] for backup and you're moving it off-site. I mean, to most of us, it would seem pretty self-evident why you want to protect the data that's on the tape but, at the same time, if you look at all the numbers on data disclosures that we've had over the last 10 years, look at the massive amount of data that is supposedly private but has been released in the wild, hardly any of that comes from backups that fall off the back of the truck. Most of the data that's been disclosed [has] been [from] direct hacks of operating systems or they've [come from] lost laptops that gave somebody the keys to the kingdom.
One company recently was quoted in Computer World to say that they discovered they have 30 virtual machines running around in their environment that they didn't put there, because once a hacker gets in, he can start using resources however he sees fit and, since there is really lousy management on VMware, you can't tell which VMs you own and which ones somebody else put up.
So, anyway, having said all that, we do need to protect data against disclosure, particularly private information, credit card information. Like in the healthcare industry, HIPAA requires the protection of the private information that is exchanged between you and your healthcare provider. So, doing encryption makes sense, to a certain degree. I think it's a best practice all unto itself. Now, the question is, how do you implement it? There's been an effort that actually was backed by the hardware guys, surprisingly, for a universal standard on key encryption, on the ability to exchange the keys between unlike platforms. It's very rare for the hardware components of this industry to work and play well together. They're competitors.
The people who won't let it happen are the encryption software guys. RSA does not want their product to be anything other than king of the hill. So, anything that's an open-source standard that normalizes them and puts them on a level playing field with their competitors they see as a threat to their market share. So, unfortunately, until we get that free exchange of keys -- a universal system for exchanging all your keys -- you're going to have to go with a proprietary solution. You're going to have to find something that manages the keys, that refreshes the keys, that changes the keys when you've done a disaster recovery test or anything else where you've actually applied the key and let it go out the door. You have to dispose of the key, afterwards, because it's already been used.
Fortunately, the erasure of tape is a little easier to manage than destroying data on a disk, which requires crushing the disk. There is no way to fully erase data off the disk that I'm aware of.
So, the encryption requirement is real. The management of key encryption is a buzz hassle. You need good key management software, and they vary depending upon the platform you're using, which one is best of breed. If anything, I like most of the CA Technology products. In the key management space, they're the ones I use all the time. Symantec, I understand, has a pretty good product, but I don't see it in the wild as much as I see this CA stuff. I'm sure there are a number of other products, as well. You apply that before the tape ships out the door.
Also, be aware that compression and deduplication generally don't work very well on data that's encrypted. So, ask your vendor before you make heavy investments in compression and deduplication, "What happens if my data is encrypted?" You may have a bad story that makes you think again about the efficacy of those technologies.
What does the future hold for tape media?
Toigo: The future is so bright you have to wear shades. I think tape becomes the dominant meme for the storage of at least 40% of the active data that is rarely re-referenced. It is already the medium of choice for storing up to 80% of all backups, worldwide. Most backup data is still stored on tape. In fact, even when people are doing disk-to-disk replication as a primary means for defending their data, they still make a safety copy to tape if they're smart.
Given the higher capacities, the new purposes that have been developed for it in terms of tape NAS and those sorts of things give it an enormous runway going forward. You're going to have beaucoup opportunities if you're in the tape business. I know all the tape vendors are loving hearing me say this. It's not that I love the tape guys, but I happen to think the medium is very robust and I think it's got a very good future ahead of it.