In part one of our series on the state of data backup, we looked at trends in data backup hardware solutions. In...
the next part of our series, learn about the new features that have been added to data protection applications -- such as data deduplication and snapshot technology -- and how backing up virtual servers introduces new challenges for backup administrators.
New features have been developed for data protection applications in recent years to take better advantage of disk-based backups. As a result, backup administrators find greater efficiencies in their backup and data protection applications but also must make more decisions in the buying process.
"Absolutely, backup offers more choices than ever before," said W. Curtis Preston, VP of data protection services for GlassHouse Technologies Inc. "If you go back just seven or eight years, it was basically, which vendor's backup software and tape drive do I buy? Now it's, do I encrypt it or not encrypt my tape drive? Do I do disk and tape, or disk to disk? What about data deduplication? CDP or near-CDP? Full backup or synthetic?"
What it all adds up to, Preston says, is that "what used to be a foregone conclusion is now a series of decisions the customer has to make."
New snapshot options speed recoveries
Cyberlink, a Taiwanese company specializing in DVD playback software for PCs, takes a software-based approach to shrinking its recovery time objectives (RTOs) and recovery point objectives (RPOs), rather than swapping out all of its tape hardware for disk.
Most of Cyberlink's data requires retention periods of more than a year, and there are no plans to move away from energy-efficient and portable tape, IT director Joe Wang says. Wang did take a different look at backup after the company's Exchange server crashed during an update two years ago, and his team had to reinstall the operating system and all the data to get Exchange running again.
"It was terrible," he said. "Everyone relies on email for daily messaging and scheduling, and you can't afford any downtime."
The incident prompted Cyberlink to deploy FalconStor's CDP software for its four Exchange servers. Cyberlink now runs FalconStor's software on a Dell 2950 server with 1.2 TB of local storage, backing up some 60 GB to 80 GB from each server every three days. After being retained on the CDP server for three days, data is overwritten. While FalconStor's product has the ability to copy every write, Cyberlink sets it to take a snapshot of the Exchange servers every hour.
Cyberlink might expand its CDP deployment to other applications. "It depends on the application," Wang said. "For file systems, traditional tape backup is much easier to restore than multiple snapshots and the data loss period could be longer."
Replication eases backup workload
Alice Jones, technical manager for the University of Illinois, manages two data centers. The University's Chicago data center has two five-year-old EMC Corp. Clariion Disk Libraries (CDL) and a Quantum/ADIC tape library with six drives. Another data center in Decatur holds one EMC CDL and another 6-drive Quantum/ADIC library.
The university stores more than 320 TB of data, and is redesigning its backup scheme. That involves an upgrade of Symantec Corp. NetBackup 5.1 to version 6.5 for Exchange 2007 support. "The goal is to cross replicate after deduplication and eventually eliminate physical tape," Jones said. "We own dark fiber, and should have the bandwidth between locations once data is deduplicated."
Currently, the school can only transfer 20% of the data it's currently backing up across the wire.
"I want everything online, and zipping across the network," Jones said. "I don't want to mess with physical tape anymore. Our tape drives have been very reliable, but they're five years old and now a bottleneck."
The combination of data deduplication and replication is becoming a staple for disk-based backup, but it's not for everybody. Yahoo Inc.'s manager of data protection Marcellus Tabor says he has yet to find a dedupe system that's a good fit because his company's data set is highly dynamic.
"Deduplication is great for certain specific problems -- like if you want to maintain lots of point-in-time copies online with relatively small changes," he said. "But if you're talking about 50% changes, then you're at the mercy of the different vendors' algorithms." With lots of multimedia files at Yahoo, dedupe systems would also have to get much better with precompressed formats such as JPEG for his shop to derive big benefits.
The virtualization equation
Backing up virtual servers remains a struggle for many organizations, despite recent updates to backup software and the release of VMware's Consolidated Backup (VCB) API designed to simplify backing up virtual machines. Depending on their backup software tool, some companies must choose between running an agent on every guest host or not getting granular backups. They also must decide whether to continue using their legacy backup software, go with VMware's VMFS (virtual machine file system) snapshots, or use specialized tools such as Vizioncore's ESX Ranger (now vRanger Pro).
Food distributor Schwan's was among the early adopters of server virtualization, and senior manager of IS infrastructure Cory Miller soon found that converting physical servers to virtual servers called for new approaches to backup.
After evaluating backup, snapshot and replication products "from the tier one backup vendors," Miller picked vRanger Pro because it was designed specifically to back up VMware virtual machines.
vRanger Pro performs hot backups of running virtual machines. Unlike many other snapshot products, it doesn't need to pause or quiesce the application to ensure a consistent state before performing backups. This is important in Miller's shop because of the nature of the applications like Microsoft's BizTalk that are processing data from a queue. "If we have to bring [the host] back up and the queue information is lost, we have to do manual scripts to get it restored, which involves a lot of work and time," he said.
vRanger Pro backs up the entire virtual machine, including system state information, patches, operating system, permissions and queues. Miller says it takes minutes to make sure everything on a domain is correct and get a server back up and running. The application also lets Schwan's use one tool to handle backup and disaster recovery for close to 600 virtual machines. "The virtual machine thinks it's shut off and brought back up in a different spot -- the copies don't realize they're copies," he said. Schwan's then uses a distance replication utility (Miller has used Robocopy and homegrown scripts, and is considering Double-Take) to send these snapshot copies to a SunGard colocation facility. There, the copies are brought back up on a physical host and transferred to a SAN at the secondary site.
Miller says the process is similar to transporting tapes from one site to the other, and restoring their files to the secondary server, but much more cost-effective and quicker. A host-based snapshot utility for virtual servers is also cheaper in acquisition and maintenance costs than a traditional linear backup system for these purposes.
Despite the advances in data protection applications, some organizations have improved the process by removing it from their IT environment entirely. Part three of this series looks at companies relying on cloud computing for backup.