Data deduplication technology has added a new argument to the long-running disk vs. tape data backup and recovery debate. Though data deduplication tools are still in their infancy, deduplication opens a path toward eliminating tape entirely from many enterprise data storage infrastructures. But tape fans say that the medium's portability and long-term data storage capability mean it's more about "tape will always have its place" rather than "tape is dead."
Even Shane Jackson, director of product marketing at Data Domain, agrees. "We've never gone to the extreme of 'tape is dead,'" he said. "As an archive medium, keeping data for seven years for HIPAA compliance in a box on a shelf is still a reasonable thing to do." Still, he said, "One of the primary drivers that leads prospects to Data Domain is that they're having challenges with tape. They can't meet backup windows -- they're having reliability problems and can't get their data back."
"SOX or HIPAA and other legislation are driving the need for customers to keep data for five, seven, ten years or even the life of the patient. It's not cost-efficient to store on disk for that long," said Eric Bassier, product manager for tape automation at Quantum Corp., which offers dedupe capabilities and also is a tape vendor and a member with other tape vendors of the LTO Consortium.
Data deduplication best practices
Quantum's recommended best practice, said Bassier, is to take advantage of deduplication to "eliminate tape where it makes sense in your environment." That might be remote offices or any locations where a company doesn't have trained IT personnel. He said that a strategy of removing tape "at the edge" makes sense, while keeping tape around for long-term data storage, retention and data recovery. "Make sure that, at the core, your deduplication can be well integrated with your tape environment," said Bassier. He suggested using a tool like Symantec Corp.'s OST "to move data seamlessly down to tape, and then from tape move it offsite somewhere."
Though data deduplication tools have made a splash in the market, it's still got an uphill battle to achieve anywhere near a majority of enterprise data storage users. According to the Fall 2009 edition of Storage magazine's Purchasing Plans survey, 21% of respondents are using deduplication, the highest number recorded, and 26% plan to incorporate it into their plans this year. Furthermore, 38% planned to increase their dedupe spending, while only 6% anticipated reducing their dedupe budgets.
"Dedupe has a long way to go, but the engines have started to move very rapidly now," said Arun Taneja, founder of the Taneja Group. "I would say that on average, maybe 10% of all data protection that needs to happen is by the use of disk. Approximately 90% of data protection today is still on tape."
Some users are moving ahead on getting rid of tape completely. "We're targeting total tape elimination by this year," said Michael Passe, storage architect of Beth Israel Deaconess Medical Center in Boston. He said that considering the dropping costs of disk now that they're deduping, "it's in the cents per gig range which is approaching tape costs." And "tapes break, tapes don't work. With the tape issues it kind of negates the cost difference between tape and Data Domain," said Passe. "It's kind of an intangible amount of dollars of not having data exposed, not having to truck tapes around."
Despite the possibility of replacing all tape with a dedupe implementation, tape is tried and tested for many users like New York City-based American Institute of Certified Public Accountants (AICPA). "Everything goes to tape once a month," said Stan Noel, senior systems engineer. He said that for many in the organization, "it comes down to security, and the fact that it's written on media, carried out of the building and locked away." He doesn't see that changing, even after the AICPA started deduplicating their data.
Tape still a mainstay for long-term data retention
Long-term data retention is still a sweet spot for tape. What's considered long-term differs at each company, but Marc Crespi, vice president of product management at dedupe provider Exagrid Systems, said Exagrid customers average 12 weeks worth of retention on disk. More than half still use tape, he said, while the rest, a little more than 40%, replicate to another Exagrid box in an offsite location. The majority of those users have completely replaced tape, he said. "Dedupe itself is more of an enabling technology," he added. "It's enabling disk, which is a much more reliable, high-performance scalable way to replace what was the only economical way to store backups for a long time."
Combining the two technologies -- deduping directly to tape -- is something that user Noel finds interesting. "It's fine if you want to do it, keep a set of data you've deduped on one tape instead of 30," said Data Domain's Jackson. "You just have to be cognizant, as soon as you move to tape media you have the same problems with performance and reliability." Quantum's Bassier said that it's an idea they've been exploring, but "there are a number of technical challenges involved in that." Currently, only CommVault Systems Inc. dedupes to tape.
Crespi of Exagrid warns that "getting data back off of tape is nontrivial." When deduping to disk, he said, all the information needed to recreate a file might be spread across many disks. "When I dedupe to tape," he said, "no one tape has all the answers. I'd have to get multiple tapes in just to get one file back. It might store data, but if you had to actually recover it would not be a simple process."
CommVault's senior product marketing manager, Dipesh Patel, acknowledges that recovery can be tricky. "One of the main challenges with tape-based dedupe," he wrote in an email, "is how to keep the data clustered together so that for recovery you're not required to bring back a huge number of tapes." He explained that CommVault customers deduping to tape choose a recovery time span -- say a 30-day window -- and the software divides that time span into individual volumes. As volumes fill up, they are then copied to tape. The set time span is clustered on a particular set of tapes, allowing recovery of the data with only that set of tapes needed. "All the backups and archives contained on that set of tapes is self-referencing," wrote Patel. "All the blocks required to restore and recover data is contained within that set." The software determines which volumes are required for restore.
But, "dedupe to tape is very much focused on long-term data retention," according to Patel. "In most cases, customers don't do mission-critical restore from deduped data on tape. That kind of data is usually kept, deduped, on disk." Instead, something like an ediscovery request might prompt a recovery from tape.
Deduplication is still in the early stages for aerospace and defense provider Simsbury, CT-based EBA&D, said Rich Stewart, project technical lead. "We're doing a pilot on one server, but we don't currently do it in production," he said. In testing "it's been beautiful." Even after adding dedupe, though, their long-term plan is to move data from disk to disk, then offload weekly to tape. "We restore from our tapes, and we're not sure yet how to change that process if we go solid disk to disk," said Stewart. "For backups, we'll probably only use tape. We're familiar with it, and we know it works."
Christine Cignoli is a Boston-based technology writer. Visit her at www.christinecignoli.com.