Eric Burgener, senior analyst and consultant of Taneja Group, answers the questions he is hearing most frequently about data protection. He discusses developments in data protection technology, VTLs, the changing role of tape, CDP, WAN optimization, SaaS, cloud-based storage, dense storage platforms, and the difference between backup and archiving requirements. You can read his answers below or download an MP3 recording of this FAQ and listen at your convenience.
Table of contents:
>>How is the VTL market evolving?
>>How is tape's role changing?
>>Any unexpected developments in data protection?
>>Any other data protection surprises?
>>Why are SaaS vendors targeting secondary storage?
>>Why are storage grids becoming popular?
>>How do backup and archive requirements differ?
How will the market for virtual tape libraries change?
Virtual tape libraries (VTL) have been the easiest way to integrate disk into existing data protection infrastructures. A few years ago, the Taneja Group predicted that deduplication technologies would rapidly penetrate into the VTL space. We just completed a report this month that sized the VTL market, and it also showed a split between products using deduplication technologies and those that don't. This year, the revenue actuals for 2007 showed the last time standard VTL products that do not have deduplication capabilities generating more revenue than those that do. Once the switchover occurs in 2008, we're going to see a rapid ramp of these deduplicating VTLs and a rapid shutdown of the VTL industry that does not include them.
Within a year or two, there will be very little revenue being generated from VTL products that do not have deduplicating technologies. Almost all of the vendors that matter by 2009 will have this technology incorporated and will be selling most of their products based around it.
How will tape's role change?
The reports of tape's demise have always been greatly exaggerated. This is not a market that we see going away. We expect that the market will continue to shrink in size, and there will be fewer tape products bought. But there's a lot of it out there in the legacy infrastructure and one way vendors will be able to sell products going forward, especially disk-based backup products, will be by showing customers how they can blend those products with their existing tape infrastructure to achieve an optimum data protection strategy. It's ironic -- if you deploy disk-based technology judiciously, you can actually improve the performance and reliability of existing tape-based products.
There are certain things that disk is very good at: fast restores, getting initial backups onto disk very rapidly and reliability. Those are all advantages in the disk space.
Tape tends to excel in two areas: low-cost storage and performance. What we're recommending to our customers is to front-end the backup infrastructure with disk. That's where they would handle most of the restore requests for files or messages -- object-level restores like that.
Eventually they'll be migrating that data after it ages back to a less costly tier, which would be tape. If they're going to be running most of their server-level restores from that level, they can actually get better performance out of tape than disk. The fact that you can keep those tape drives streaming for that kind of activity means that the tapes will perform that much more reliably.
Are there any unexpected developments in the data protection market?
One is in the continuous data protection (CDP) space. When vendors started selling products in this space, they were primarily selling what we would call an enterprise CDP-type of solution. There would be an appliance sitting in a network collecting data off of a very large server. Those were targeted for use at large database and Exchange installations that required a very granular, rapid-restore capability.
What we've seen develop is an entirely different segment for CDP in the laptop and remote office backup space. It is leveraging CDP technology primarily because of its low bandwidth requirements. CDP is basically picking up every write as it occurs and storing it on some sort of a disk-based journal. The bandwidth requirements for something like that are actually quite small. This provides an excellent way to perform backup in the background all through the day so you can spread that bandwidth use out across a 24-hour period, as opposed to trying to shove all of that data through the pipe in a very short period of time by performing a periodic backup.
What's surprising is that the amount of revenue being generated for laptop, remote office backup and other disk-connected client-type backups using CDP is about as large as the market is for enterprise CDP.
Are there any other surprises in the data protection arena?
Yes. Companies in the wide-area data services [WDS] space, like Riverbed, Cisco, Juniper Networks and Blue Coat, are using products to accelerate applications so they can run with LAN-like performance, even though they might be accessing that application across a WAN. We've seen these vendors now start to integrate local disk caches into their WAN acceleration appliances. You'll see all the major players in this space in the next 12 to 18 months integrate these local disk-based caches. That's going to allow people to use these appliances in a much more targeted way for backup purposes so they can actually back up to that WDS appliance in the local remote office and then do restores directly from that disk-based cache. This will put these vendors into much more direct competition with data deduplication vendors, such as Data Domain and Quantum, who are also selling small appliances that can potentially be located in the remote office.
WDS products are already leveraging data deduplication technologies to slim down the amount of data they sent across the WAN. But there's no reason why they can't use those same integrated technologies against data at rest that would be sitting in those local disk caches that are now out in the remote sites.
Why are storage as a service (SaaS) vendors targeting secondary storage applications?
If you look back to the 1999-2001 timeframe, companies like StorageNetworks were going after the primary storage for large commercial enterprises. The pitch then was, 'Let us manage your storage, that way you won't have to worry about it. It's easier for you, and we can do it less expensively than you can.' What they found was that companies were uncomfortable turning over their primary storage to an outside vendor. There were security and control issues, and the technology just didn't provide a good way to address these issues for those large end users.
But over the past five years, technology has provided much better ways to deal with security and control issues. There's been a culture change where people are a lot more comfortable now working with outsourced services. The success of companies like SAT and SalesForce.com, that are basically outsourcing applications services, has made people much more amenable to this kind of thing.
But the biggest reason why this new round of Storage as a Service is succeeding is because these vendors are not targeting primary storage. They're targeting secondary storage applications, like backup and archive. People are much more comfortable turning those types of applications over to outsourced companies. In the managed service provider space, there are literally hundreds, if not thousands of companies that are selling online backup services and storage as a servicemany of them based on technology from companies like Asigra and Roboback and rebranded.
We also have seen companies come out with cloud-based storage offerings. Companies like Amazon, with their S3 offering, EMC with their Mozy offering and Symantec's Protection Network. Those tend to target smaller customers Though you've also got companies like Nirvanix who are clearly going after larger enterprises and touting enterprise benefits and scalability.
Why are dense storage platforms like storage grids becoming popular?
One of the givens in our industry is that storage capacities are continuing to explode. Data rates are going to grow 50% to 60% per year over the next five years and in excess of 85% of that growth will be unstructured data… in other words, file system-based data.
So now, companies have to realistically think about how they put platforms in place that can store petabytes of data over time. The older architectures, the more monolithic approaches, don't provide a good way to do that cost-effectively. So we're seeing the introduction of storage grids and scale-out approaches targeted at unstructured data -- things like Web 2.0 and then also secondary applications like backup and archive.
These dense storage platforms use a very different approach from the monolithic architectures of the past. You can pay as you grow with them and you can add performance, I/O or capacity to these things independently, which gives you a lot of flexibility in building the configuration that best meets your performance requirements.
The key issues to look for are petabyte-class scalability and the ability to maintain high levels of I/O performance at that type of storage capacity. You're also going to look at data reliability. You'll also want to look for data deduplication technologies that are integrated into the platform so that you can use the deduplication multiplier to lower the overall cost of storage.
For example, if you're paying $8 a gigabyte for SATA-based storage, but you're achieving a 10:1 data reduction ratio using data deduplication technologies, now you're getting that down to under $1 a gigabyte. That's a key economic argument from the vendors playing in this space. If you're pitching these platforms for backup or archive use, they are clearly going up against tape, and tape is 20 cents to 30 cents per gigabyte.
There's one final area you would need to evaluate these platforms on: whether or not they've got an integrated replication capability, which is really the way you solve backing up that data storage cell, providing a disaster recovery solution, etc. If we're talking about a data store that's 400 TB in size, there's no way you're going to be able to back that up with tape. There has to be another approach, and what's becoming the accepted way to address that these days is by replicating that to an off-site location, such as a mirror platform.
How would you differentiate backup from archive requirements?
The problems you are trying to solve with backup are backup window issues, and recovery point and recovery time objective issues. Do I have the flexibility to be able to recover from different points, and how quickly can I get that data back when I want to restore it? And then reliability.
In the archive arena, it's a lot more important to deal with things like the cost of storage over time. That cost has to be much lower because your archives are going to be storing a lot more data. If I archived something five years ago and then I go to get it, how do I know for sure that what I'm getting back actually is a reliable representation of what that data looked like when I first put it in the archive?
But there's a third requirement making companies start to consider active archival storage platforms -- in other words, disk-based platforms -- a lot more seriously, and that is the requirement for online searchability. A lot of companies are spending a lot more to run e-discovery against tape-based archives than they would against disk.
Many companies are making the assumption that each lawsuit is going to cost them $500,000 in e-discovery costs if they're running against tape-based archives. If it's costing you $500,000 per lawsuit to handle e-discovery efforts against tape, it doesn't take very many lawsuits to get disks to be competitive with tape.
A lot of companies are still looking at backup and archive as the same thing, the only difference that that the archive tapes happen to be sitting at an offsite location. The e-discovery aspect needs to be taken into account.