LAS VEGAS -- In a presentation at Symantec Corp.'s Vision conference on Tuesday afternoon, two blue-chip customers said Symantec's NetBackup PureDisk data deduplication product allowed them to centralize management of remote office backups and save on WAN bandwidth, but also said there were some scalability kinks in the product that have yet to be worked out.
According to Tony Elzinga, director of storage strategy for JPMorgan Chase & Co., the company decided to deploy PureDisk at 250 remote sites to eliminate tape and centralize management of remote office data.
The PureDisk clients are connected to one of two "core" data centers, one in the southern region of the country (Elzinga didn't specify where) and one in the north. Each of those cores is handling data from 125 remote clients, and each has 2 terabytes (TB) of Fibre Channel disk attached, though so far Elzinga said the company has only had to use about half of that capacity at each core.
However, at many of the remote sites, bandwidth is extremely limited. "We have some links that are as small as 384 KBps," he said. A remote office replication product would have required him to replace many of those links, a cost that would run well into the millions.
"The big difference here is that the [data deduplication] isn't just done on one bit -- the product looks to see if it has the same 128 [kilobyte] segment over any of 125 clients," according to Elzinga, allowing the company to keep 60 days worth of backups on a total of 3.5 TB of disk between the two core sites.
Meanwhile, Jeff Krueger, data protection manager for Qualcomm Inc., said the PureDisk product was a more appealing approach to dealing with small remote sites than adding a NetBackup instance at each one, at a cost Krueger estimated at $50,000 each for servers, tape libraries and software.
Qualcomm is just beginning to roll out PureDisk with 40 clients attached to one central repository, which Symantec calls a Storage Pool, at the company's headquarters in San Diego. The sites are distributed throughout North America, Europe and Asia, including locations in Taiwan, the U.K. and the U.S. So far those sites have accumulated about 5 TB of backup data on 1.5 TB of disk.
Krueger said that another appealing feature of the product was its upcoming integration with NetBackup 6.5. "We are planning to export data to tape through NetBackup Enterprise and expire older backups off disk," Krueger said. The company will also be adding another Storage Pool node in London, in order to speed up backup and recovery times, and because London acts as a kind of "secondary headquarters" to European operations.
Scalability still an issue
Both companies have already identified one limitation in version 6.1 of PureDisk -- the fact that it caps each Storage Pool node at 50 million files. "We're already at 45 million at one of our core sites," Elzinga said.
That limitation is also the reason JPMorgan chose to go with two Storage Pools in the first place. "We're hoping to see this limitation moved up in future releases," Elzinga said, though he added that having two Storage Pools to manage "is a big improvement over the 250 [separate remote sites] we were managing before."
Elzinga also said that on smaller WAN links, he found that more than 250 GB of data took too long to back up because of latency on the network -- up to a week in some cases. For those environments, Elzinga said the company is still using Network Appliance Inc.'s (NetApp) SnapVault.
"At our larger remote sites we are able to use our NetApp filers as primary storage and for backup, but sites smaller than 250 GB didn't justify the cost of a filer anyway," he said.
Krueger said that he's been backing up remote sites of over 1 TB without the same problems, due to larger WAN links. However, he said the database limitation to 50 million files is a concern, though "adding more nodes to boost capacity or performance is typical of data deduplication products. We're already used to managing over 30 instances of NetBackup," he said. "It's a philosophical choice for us, we see the appeal of managing fewer boxes, but don't like to put all our eggs into just one basket anyway."
Meanwhile, however, Qualcomm found it had to "precede" the data on initial backups, sending the initial backup data on tape to the main data center, restoring it to a network share onsite and performing the initial full backup over the local LAN rather than via the WAN. "There's still some time involved with each remote site having to sync its catalog when you do that, but it saved a lot of time with the initial backup," Krueger said.
According to Wim De Wispelaere, senior product manager for PureDisk with Symantec, the next release of PureDisk, version 6.2, which will become generally available in July, will raise the database limitation to 100 million files.
As for the bandwidth issues, "There's a balance users have to reach between bandwidth and capacity," De Wispelaere said, but added that Symantec is working on reducing the number of times remote sites have to ping the central storage pool to detect duplicate data segments. "We're planning major improvements there for version 6.5, which will be available by the end of the year," he said.