Data backup has changed a lot over the past few years. In the old days, backups consisted of a backup server, backup...
software and tapes. That was simple, but often slow and unreliable. Today's backups have improved the process by introducing more choices, but in the process more complexity has been added as well.
The biggest change in data backup hardware solutions over the past few years was the introduction of disk as a medium that rivals -- and in many cases surpasses -- tape. But backup hardware choices are more complex than choosing disk, tape or a mixture of both. Disk itself takes multiple forms for backup, including virtual tape and various types of disk-as-disk targets. Data backup hardware and software tools have been updated to reduce data through deduplication and compression, and provide more granular recovery points and multi-site replication.
This four-part series explores the latest advances in backup technology. The first three parts cover new developments in data backup hardware solutions, data backup software, and the cloud.
Looking at data backup hardware solutions: Is tape dead yet?
When Beth Israel Deaconess Medical Center storage architect Michael Passe became disenchanted with tape management, he saw no point in emulating tape either. Earlier this year, Passe deployed a Data Domain DD580 with a NAS interface for 10 TB to 20 TB of nightly backups. Data Domain offers a virtual tape library (VTL) interface with its data deduplication boxes, but most of its customers choose NAS.
"Our goal by the end of 2009 is to have all of our tape silos gone," Passe said.
Beth Israel, which has about half a petabyte of storage on the floor and backs up around 100 TB per week, also considered switching to a disk-to-disk-to-tape scheme. "But then we thought, why do all of that extra work to manage an aging technology?" Passe said.
There was no cost benefit of disk-to-disk-to-tape (D2D2T) either, Passe says. "We ran the numbers and replicating data offsite rather than paying people in trucks was equivalent or less than the cost of new [Sun] STK tape silos," he said. "The days for [tape] technology are numbered. The economics both on the purchase side and in not needing as much staff to manage disk are going to make it tough."
Many believe tape will live on for long-term retention because tape cartridges can be stored on a shelf without power and cooling, and individual tape drives aren't subject to the same replication-based "rolling disaster" as some disk-based backup systems. In a rolling disaster, data corruption takes place over time rather than instantaneously.
However, Passe says dedupe provides density, power and cooling benefits, and "a rolling disaster should not happen with our Data Domain boxes." The boxes perform a checksum before receiving replicated data to make sure it's correct, he said. "We also periodically snapshot the whole device, so we could theoretically reconstruct it rolling back," he added.
NAS or VTL?
But even administrators who agree with Passe that disk is the better data backup hardware solution aren't sure that the NAS interface is the best choice. Larger enterprises tend to choose VTL because it uses Fibre Channel and doesn't require shops to change their backup process.
"Small companies prefer the NAS interface for simplicity," says W. Curtis Preston, VP of data protection services for GlassHouse Technologies. "But large companies need the performance of Fibre Channel, while NAS communicates over Ethernet, and sharing and provisioning NAS systems becomes much more complex in a bigger environment."
VTL was the first interface designed for disk-based backup, because backup software at the time was still developed to write to tape. It is still favored by larger shops that are more likely to have a heavy investment in tape.
Yahoo Inc.'s manager of data protection Marcellus Tabor says his company stores petabytes of data on tape silos and hundreds of terabytes on VTLs from NetApp, EMC and Sepaton.
Tabor says he prefers VTL over disk-as-disk "because it's better for media management - it lines up nicely with most of our backup policies set for tape." By writing data to multiple virtual drives at once, VTL also offers the parallelization for backups that a shop with petabytes of data needs to maintain backup windows.
Disk backup isn't perfect
Other large shops have run into problems with disk-as-disk environments. David Ping, data center storage team lead for Pacific Gas and Electric Company (PG&E), says the trend in recent years toward consolidating and centralizing storage had implications for backup that weren't well understood.
PG&E used a SATA partition on Hitachi Data System's AMS 1000 system for disk-based backup, but ran into performance bottlenecks trying to do too much on one system. "Companies want to put everything on centralized storage but they don't think about how it's going to be backed up or what kind of performance issues might arise if you're backing up to drives within the same device you have data on," Ping said.
PG&E is upgrading four of its six IBM Tivoli Storage Manager servers to new IBM p570 hardware while adding two more TSM servers to the environment, "which hopefully will allow us to shrink our backup window."
Ping said he's considering a VTL to offload disk backup. "There has also been some discussion between operations and engineering about whether it makes sense to dedicate a mid-tier AMS box so backups and the production data aren't using the same storage sub-system," he said.
How to keep up with data growth?
Perhaps the only sure thing about backup today is that the amount of data involved will continue to increase.
Yahoo's Tabor says the relentless data growth "is what keeps me up at night. Once we have petabytes on the desktop or a petabyte iPhone -- not necessarily that far-fetched -- what do you do with all the media? It's going to cost a lot of money no matter what you're using." He also worries that media capacities will not keep pace. "We need LTO-6 right now, in a big way," he says.
Most large shops and the consultants who advise them say that once data grows past a petabyte, a much bigger shift than a move to disk-based backup will need to take place.
Check out part two of this feature for a look at the state of data protection applications.