NEW YORK, NY -- Data deduplication is a hot topic at this year's Storage Decisions conference, with users saying they're gung-ho about deploying the technology. However, those with large storage environments say they've had trouble finding a product that fits their requirements.
Brian Greenberg, director of data protection services for a large financial company based in Chicago called data deduplication the "Holy Grail" of disk-based backup Wednesday during a presentation on disk-based backup.
Still, Greenberg's company, which he declined to name, is sticking to tape for backup for now while waiting for deduplication to become more useful for disaster recovery.
So why isn't he using it? Greenberg said he will not deploy a data deduplication appliance until he finds one that can copy its deduped data store and its index to tape for disaster recovery purposes. He could copy data from most data deduplication systems to tape by "rehydrating" the data and backing up the same data separately, but Greenberg said he wants to save space on tape, too. "Being able to backup the catalog is a standard feature of a tape backup environment," he said. "Many of the vendors have asked me why I'd want to do tape backup when I can replicate between systems, but what if there's a rolling disaster that corrupts both?"
Pete Fischer, storage administrator for a large paper and packaging manufacturing company, said his company is desperate to find a product that can reduce the 400 TB of data it must protect every 24 hours. The company uses IBM's Tivoli Storage Manager (TSM) to send data from EMC Clariion CX500, 600 and 700 systems with a total of 27 TB usable capacity to Clariion Disk Library (CDL) virtual tape library (VTL) systems.
"We have barely enough room to keep our incremental backup data in the disk pool," Fischer said. Any overflow gets sent directly to the CDLs, which are also trying to backup data from the disk pool, causing bottlenecks. Fischer also said he's running out of capacity in his tape libraries, estimating that a fully populated Sun StorageTek SL8500 has about 30 percent of the drives he needs.
Fischer's company has brought in a Data Domain box for testing. He's also evaluating Diligent Technologies, but favors Data Domain because Diligent is strictly a VTL. "We're leery of VTL and tape in general at this point," Fischer said. His firm is putting Data Domain DD560 systems through rigorous performance testing, and Fischer said he's not satisfied with the product's scalability. The DD560s hold just over 1 TB of disk apiece, so he will need to deploy at least eight boxes and silo his data according to application. "What I want is to have the boxes be aware of each other, and to be able to get even more data reduction across applications," he said.
Mark Glazerman, storage and backup admin for a plastics manufacturing company, is happily running Data Domain DD560 and DD430 boxes to back up 25 TB. Glazerman said his most recent monitoring reports from his Data Domain systems show an average throughput of 10 MBps over 24 hours. That satisfies Glazerman, but won't work for everybody. [Update: Following publication of this article, Glazerman contacted SearchStorage.com to clarify that the 10 MBps throughput rate reported by the system is per drive, rather than for his entire system. At 15 drives, the entire system is getting an average throughput of 130 MBps, Glazerman said.]
Jannes Kleveberg, solution area manager for ATEA, a consulting firm that manages storage at a large automobile manufacturer's facilities in Europe, has considered deduplication for his client's 600 TB shop. He heard Glazerman's per-drive performance numbers with Data Domain and said "that kind of performance won't do in a large environment."
Kleveberg said he's concerned about post-process systems causing contention with the servers they draw data from after the backup window is over. "For us it always comes back to the performance issue," Kleveberg said.
Data Domain's director of product management Ed Reidenbach said users may point fingers at deduplication if they have poor performance because it's an unfamiliar technology. "We spend a lot of time debugging customer networks to resolve the issue, but since we're the new player in the environment [users] think we're the problem," he said. According to vice president of marketing Beth White, Data Domain is working on letting individual boxes connect through a global namespace to scale better. "We're still pushing the upper limits of our product," she said. "All of us [vendors] in this market are still working our way up the food chain to those megascale data center environments."