Published: 12 May 2003
|Case Study: Mixing it up with remote backup|
Initially, Charlie Roberts had a modest goal for improving his organization's disaster recovery capability: He wanted to replace a patchwork quilt of third-party backup applications with a single solution in the same location.
"We were just looking for a more elegant backup solution than the one we had," says Roberts, VP of IT at the Travis Credit Union in Vacaville, CA. "The applications were all different and they required a lot of human intervention."
Since then, Roberts has streamlined things at his 51-year-old institution, which serves 113,000 members in nine Northern California counties. His team is gradually weeding out those old labor-intensive programs, replacing them with an online solution that automatically seeks, copies and stores new data daily.
The biggest change: The new system no longer nestles next to the core systems it's designed to protect.
For now, Travis--with $1.2 billion in assets ranking among the nation's largest credit unions--relies on a remote backup site just across town. But that's just the first step. Later this year, the institution--located near Travis Air Force Base--will begin backing up all of its data to a permanent disaster-recovery center in Merced, CA, more than 150 miles away.
Based in earthquake country north of San Francisco, Roberts had been investigating ways to better protect his organization's critical financial data even before Sept. 11. After the terrorist attack, he stepped up his quest. "It showed everyone how potentially vulnerable we are," Roberts says. "If I had a catastrophic failure, I wasn't necessarily convinced that I could restore everything." In addition, the old setup made it tough to find and retrieve individual records from the hundreds of gigabytes of stored information.
Today, Roberts has what he describes as "a mixed storage environment." He uses Adaptec 160 SCSI adapters to connect to six RAID arrays from Nexsan Technologies Inc. of Woodland Hills, CA. The setup includes two Nexsan ATAboy and two Nexsan ATAbaby devices onsite and two more ATAboys offsite. He still backs up transactional data on tape cartridges and less critical information--such as employee files--gets copied to local servers, magnetic media or CD-ROMs at both locations. But he primarily relies on the EVault Inc. InfoStage online backup and recovery suite and an existing T1 network for taking snapshots of his networks several times daily.
"The copies we're making are good, clean and robust, and we don't need to physically send anything" to the remote facility, Roberts says. "We just do an electronic copy and send it online." He can also restore specific pieces of information quickly right from the network--a capability he used even before the system went live.
"While we were testing, our executive VP couldn't find an e-mail she needed," Roberts recalls. "We had just done the seed load--the initial snapshot of the system--and it turned out the message she needed was in that load." He retrieved the message instantly, solidifying at least one executive's support for the effort.
The switch hasn't been painless. "Initially, we weren't able to restore as quickly and painlessly as we'd seen in the demos," Roberts says. But Walnut Creek, CA-based EVault has consistently responded promptly, Roberts says, once sending two representatives to help speed things up.
The new system has freed up one team member previously dedicated to manual backups nearly full time. Beyond that, Roberts declines to discuss the system's cost or its potential ROI. "I put it in the category of the cost of doing business," he says. "You don't know how much you need it until you need it. The first time I have to use it will pay for itself many, many times over. When you don't have to tell customers 'I don't have your transactional data, I don't know how much you have in your account'--how do you put a price tag on that?" --Anne Stuart
As the Sept. 11 terrorist attacks on the United States proved, it's no longer good enough to stash backup tapes on a different floor from the data center, or even down the block. And in some cases, the government is mandating tougher disaster recover procedures for certain industries. For example, The Health Insurance Portability and Accountability Act (HIPAA), as part of establishing standard data formats and content for all health insurance providers, requires safeguards including technology-based contingency planning and disaster recovery to ensure the safety of patients' records. These safeguards include "periodic tape backups of data" and the ability to continue operations "in case of an emergency," according to HIPAA rules.
All told, "companies are thinking more regionally now," says Dianne McAdam, a senior analyst at Data Mobility Group in Nashua, NH. "We have more reasons to replicate data over longer distances," she says.
There are already numerous ways to send data afar. Methods include synchronous replication products available for quite some time, primarily from the storage hardware vendors--IBM's Peer-to-Peer Remote Copy (PPRC) and EMC's Symmetrix Remote Data Facility (SRDF) fall into this camp. But there are distance limitations, or else the products can replicate only onto like hardware boxes. They typically don't work in a multivendor storage environment.
More important, the traditional synchronous mode of replication won't work well for really long distances--anything over 12 miles or so. In synchronous mode, system A sends data to system B, and then waits for a response that system B received the information before sending any more. This usually means a delay of 1 ms per 25 circuit miles. (Keep in mind that a circuit mile isn't the same thing as a regular mile. A phone call that travels between New York and Boston might be only 200 distance miles apart, but that call can pass through 400 miles of circuits or more. The 1 ms delay applies to circuit miles.)
That leaves asynchronous replication--where the first system doesn't worry about a response from the second. This avoids the performance delay, although it doesn't address network latency. Still, it leads to problems of its own. First, if there's a failure on either side, it's not clear how far behind the backup volume is from the primary copy. It's certain that they're not exact duplicates of each other, and it will take some human time and effort to sort out what's missing and how to recapture it.
Second, some backup systems batch I/Os together instead of shipping them in the exact order they came in. So, the backup copy might not be usable because the data is out of order.
One new product that attempts to solve these problems is SANSafe from Topio Inc., Santa Clara, CA. It essentially timestamps each piece of data before it's sent, then sends the I/O to the remote site via an IP network and then sorts each transaction in order. The remote volumes are updated on a periodic basis, so there's a copy that consistently matches the local volume. Topio promises that, despite its name, SANSafe works with direct-attached storage (DAS) as well as data residing in SANs. Topio also claims that SANSafe can work with multiple hosts sending to multiple remote locations.
Backup over IP
Backup over IP seems to be one popular way of approaching disaster recovery these days. Products are popping up all over in this space, ranging from supporting iSCSI to sending data to tape devices via IP. Mike Karp, a senior analyst with Enterprise Management Associates in Boulder, CO, is a big fan of iSCSI in particular (see "Getting real about iSCSI").
"It's known to work, it's relatively easy to implement and it's a proven technology because it combines things we've been doing for 15 years," he says. He maintains there are no more inherent security risks by moving data over long distances than there was moving data between two systems located a half-mile apart.
"There is nothing in iSCSI that makes it more vulnerable as a system than any other technology," Karp says.
One notion that's seeing quite a bit of action these days is providing central access to distributed data--in other words, recentralizing all corporate assets into a data center and then backing it up to tape or by some other means. Distributed users are provided access to the centralized information via IP or other type of WAN.
Here's how this works. In each remote office, companies place a low- or no-maintenance caching appliance. A full-fledged server goes into the data center. The appliance and bundled software scan for the portion(s) of any file that has changed since it was last written or requested. Once everything is centralized in the data center, it can be backed up to tape or to a secondary data center or storage vaulting provider.
|Case Study: Disaster recovery on a tight budget|
Qs the nation's largest airline fights for its life, Apolonio Maranion's backup and recovery systems could play a big role in the carrier's survival.
Maranion is a systems administrator for UAL Loyalty Services Inc. (ULS), the e-commerce wing of beleaguered United Airlines. The Chicago-based airline's parent company, UAL Corp., filed for Chapter 11 bankruptcy protection from creditors in December 2002.
Company executives and industry analysts both call United's vaunted customer outreach and rewards programs crucial for keeping the company aloft as it struggles to restructure in a sagging economy.
That's where Maranion comes in. In addition to supporting the airline's main Web sites--United.com, UALCargo.com and local sites in 23 countries--his ULS IT team handles all online transactions for the airline's rewards programs, including Mileage Plus (for frequent travelers), Silver Wings Plus (for senior citizens) and MyPoints.com (for frequent visitors to participating stores and restaurants). Together, the loyalty sites serve nearly 1.5 million unique users monthly--thousands more use them to connect to United's online booking engine.
As United continues downsizing, keeping those sites running smoothly and securely is more important than ever. Says Maranion: "When you're reducing the number of city ticket offices and you're reducing the number of people at the [airport] counters, you need to make [the customer's] online experience as robust as possible." That's easier said than done, considering that ULS remains under stringent cost-cutting mandates, with little hope for increasing IT headcount anytime soon.
Before the downturn, ULS outsourced all its Web operations--including backup and recovery--to managed hosting companies such as Laurel, MD-based Digex Inc. But as airline travel plummeted following the 2001 terrorist attacks and the continuing economic slump, United moved all e-commerce activity in-house.
Today, the ULS IT group operates from a 4,000 square foot data center in the basement of a United-owned building in Elk Grove, IL, about 20 miles west of Chicago. Its backup facility is in nearby Schaumburg, IL. Although the centers are fewer than six miles apart, Maranion calls the distance sufficient for protecting ULS' 20TB of data because the Chicago area is unlikely to suffer from a disaster--such as an earthquake--that would cause widespread damage.
However, because the two data centers are so close, ULS does exercise unusually stringent security measures, such as limited access (biometric palm-print readers for IT staff, escorts for everyone else) locked storage racks and cameras throughout both facilities.
The ULS set up consists of a storage area network (SAN) using three Hitachi Data Systems Lightning 9960s subsystems, two in the main data center and one in the backup facility. Initially, Maranion says, ULS used EMC products, but switched to Hitachi to save money. The Hitachis connect to 200 Windows and Unix servers over five Brocade SilkWorm port switches. Maranion says he can easily add new clients. A Gigabit Ethernet network links the two centers: A suite of Veritas software products, including Net Backup 4.5, handles storage and duplication of thousands of member transactions daily.
ULS also uses a collection of Quantum DLT 7000 and 8000 tape drives for storing archival tapes in libraries in each facility, but Maranion says he doesn't use tape drives for routine backups because he doesn't consider them cost effective.
In a true spirit of efficiency, ULS uses the backup servers in Schaumburg for staging whenever they're not in use for disaster recovery. If the Elk Grove center fails, everything switches to the backup facility, automatically shutting down staging activity until normal business resumes. Meanwhile, says Maranion, "there's no single point of failure within the United environment."--Anne Stuart
New names here include DiskSites in Santa Clara, CA; Tacit Networks Inc. in South Plainfield, NJ; and Actona Technologies Inc., formerly called VersEdge, in Los Gatos, CA. Theirs is a slightly different take on existing replication vendors including NSI (maker of Double-Take software), but the fundamental concept is much the same.
Some companies are attempting to boost the speed of iSCSI-based devices. Alacritech Inc. in San Jose, CA, makes a line of storage accelerators for Windows and Linux. They move both block- and file-level traffic, and offload data movement via a custom adapter.
EVault Inc., in Walnut Creek, CA, is an older player with a new tale. Formerly a software services provider, it now sells the bulk of its service as a software suite. Its InfoStage family includes agents to scan the server for new data, dedicated hardware to send the data out to remote locations and management and monitoring tools to make sure it all happens correctly.
For its part, SANgate Systems, Southborough, MA, is working on the next generation of the SANblaster--a data migration appliance--which will be an in-line device that's part and parcel of the storage area network (SAN) infrastructure, says Tom Grave, SANgate's product marketing manager. Where today's device is an appliance that's used only when needed, tomorrow's will be a permanent part of the SAN architecture, used for data replication in applications including disaster recovery.
Finally, there are some new twists to the old standby--tape backup. Computer Network Technology (CNT) recently announced a way to stream data to tape over IP. Using CNT's UltraNet Edge Storage Router, the technique is called tape pipelining, which uses buffering and error recovery to do for Fibre Channel (FC) what's been available for some time for ESCON, and can go "thousands of miles," the vendor promises.
State Street, a financial services company, is one happy user. "By deploying CNT's remote tape pipelining solution, we were able to use the same storage infrastructure for both disk mirroring and tape backup," says Mark Sontag, assistant vice president, manager of technical services at State Street's Kansas City office. "Additionally, the UltraNet Edge's tape emulation and compression capabilities helped us increase the throughput over our available IP bandwidth. The full backup of approximately 1TB of data was decreased from 18 to 24 hours to an amazing four-and-a-half hours."
Veritas has updated NetBackup 4.5 to make disaster recovery easier and announced a new version of its Global Data Manager software. The new version of its Global Data Manager software includes a tool that manages and monitors backup and recovery processes for both Veritas NetBackup and Backup Exec. Global Data Manager gives administrators a dashboard view of multiple data protection processes that may be spread across an enterprise.
Bob Maness, Veritas' senior director of product marketing, says that administrators can use Global Data Manager to efficiently manage remote office backup with Backup Exec and multiplatform enterprise data protection with NetBackup by controlling the processes and policies from a single point. Says Maness: "It makes the road to consolidation of backup and restore systems easier."
Which product for you?
Finding a backup product is the easy part. Figuring out which one might work best in your storage environment is the difficult part. Most customers--particularly large ones--need to think about a multitiered approach to backup and recovery, experts say. One product isn't going to work for all applications. IT and business users need to sit down together and figure out what constitutes the company's mission-critical applications, and how many minutes or hours those can afford to be down. Then they need to do the same for their second-tier and third-tier applications--things like payroll or human resources, which probably won't need to be restored right away.
Once the pecking order is established, figure out what you're willing to spend. The most expensive types of solutions involve disk mirroring, the next expensive are disk-to-disk and the least expensive are tape backup. You'll probably want to throw the best backup at your most critical applications.
Whatever you decide, know that the remote-backup product train is just beginning to roll. New companies are springing up that promise to make remote backup easier. Older, established storage companies are beefing up their disaster recovery options--sometimes by upgrading established products or coming out with feature packs or modules that work side-by-side with their backup offerings. Many other vendors will be jumping on this bandwagon, and it will be a matter of deciding which approach makes most sense for your particular situation.