Complete guide to backup deduplication
A comprehensive collection of articles, videos and more, hand-picked by our editors
Data deduplication -- the technology that gave disk a leg up on tape for backups -- has been on the market for 10 years. And while it hasn't fulfilled its mission of killing tape, data dedupe has become a mainstream technology that is still emerging when it comes to its adoption rate.
The data protection technology first showed up in 2003 in disk-based hardware from Data Domain and software from Avamar, although the term didn't really catch on for a few years. A decade later, dedupe is practically a requirement for all backup products, although most companies say they still aren't using it for data protection.
Neither Data Domain nor Avamar discussed deduplication when they launched their first products. They emphasized the advantages of disk over tape, while mentioning the ability to reduce data footprint to shove more on a disk.
Virtual tape libraries (VTL) were coming into vogue in 2003, with a bunch of startups looking to capitalize on tape's shortcomings. But disk wasn't always cost effective for backups, unless the data could be reduced. Data Domain's reduction capability made it the fastest growing of the new disk backup vendors. It expanded from one paying customer in 2003 to 500 in 2006, then to 1,200 the following year.
EMC -- which bought Data Domain for $2.1 billion in 2009 -- now claims that more than 36,000 Data Domain appliances have been sold in 10 years, protecting more than 26 exabytes of data.
Data Domain's success prompted others to try the new data reduction technique. In 2006, ADIC acquired the patents of Australian startup RockSoft. Quantum then bought ADIC, and the Rocksoft software became the engine for Quantum's DXi dedupe appliances. Also in 2006, Veritas (now part of Symantec) acquired Data Center Technologies (DCT) for its data reduction technology. That dedupe technology went into Symantec's PureDisk product, and ultimately into its NetBackup and Backup Exec software and appliances.
EMC acquired Avamar for $165 million later in 2006, and IBM acquired Diligent Technologies -- another early dedupe player -- for $200 million in 2008, before EMC won a bidding war against rival NetApp to grab Data Domain the following year. The Data Domain buy gave EMC source-based dedupe technology from Avamar and target-based dedupe technology from Data Domain.
With these acquisitions, the term "data deduplication" began gaining in popularity -- and so did the technology. By 2010, the enterprise dedupe roster included software from Asigra, CA, CommVault, EMC, FalconStor, IBM and Symantec and hardware from EMC, ExaGrid, IBM, NEC, Quantum and Sepaton. The technology is now common even among SMB storage software vendors.
Stragglers included Dell and Hewlett-Packard. HP, after a few false starts, re-launched its internally developed StoreOnce dedupe software and appliances in late 2011. Dell turned its acquisition of dedupe startup Ocarina into a disk backup appliance in January 2012.
But dedupe is not yet a ubiquitous backup technology. According to the Storage Magazine/SearchStorage Purchasing Intentions survey conducted in late 2012, 42% of the 702 IT professionals who responded said they have dedupe or would install in in 2012. Another 31% said they would evaluate it. That still leaves a lot of room for growth for this mainstream technology.
Market research firm IDC forecasts that the disk backup appliance market will grow to $5.9 billion in 2016, up from $2.4 billion in 2011. IDC put the market at $860 million in the fourth quarter of 2012. EMC led with 66% market share, followed by Symantec, with its NetBackup and Backup Exec appliances at 12% of the market, with IBM, HP and Quantum rounding out the top five.
That market is driven by dedupe technology, according to IDC analyst Robert Amatruda. "Deduplication has been a disruptive technology, making the use of disk for data protection more economically feasible," Amatruda wrote in a market analysis report on purpose-built backup appliances in April 2012.
It was the desire to eliminate tape that drove early users to dedupe, even if they didn't know the term at the time.
Lisa Hazen, director of IT at biotechnology tools company Labcyte, said she learned of Data Domain in 2006 from a friend who worked for the startup. Hazen had recently moved to a low-cost NAS backup system to get off tape, but quickly found her backups were taking up too much disk space.
"I purchased a NAS box that didn't have a whole lot of smarts, it was just for the purpose of getting more drives," she said. "That was my first foray into D2D [disk to disk] backup. I bought a terabyte and it quickly got eaten up."
Hazen said she was doing weekly full backups and daily incrementals and keeping four weeks online. That meant she needed enough disk for five or six times her main storage. "And that's a big expense," she said. "So the NAS solution was good to get me away from a tape-only mindset and into the D2D mentality, but then I knew I needed something better. Dedupe solved that problem."
Hazen attended a Data Domain presentation, where a rep from the startup explained how it reduced data (her notes from that meeting don't include the word deduplication, and she doesn't remember if it was used). She was a little confused by the new technology at first.
"It was a leap of faith," she said. "They weren't a household name and this dedupe thing was new. They explained that any individual block of data will only be on disk once and any references to that will simply be references. I couldn't help but ask, 'What if they block goes bad?' They said, 'Well that doesn't happen.' They assured me the guys writing the algorithm were the best in the industry and they had patents on the algorithm work."
She was also swayed by talking to another Data Domain customer from a large company that had several systems and said he had no problems. Hazen bought a DD430 -- one of Data Domain's smaller boxes -- and she said she hasn't had any problems. She currently has a DD630 with 9 TB of usable capacity and gets about a 20-1 dedupe ratio.
Hazen said she is happy with EMC's management of Data Domain, even though, when she evaluated an EMC SAN array, she "ran screaming away from them. They were very proprietary, and their product was more expensive than [she] could afford at the time."
But she said the acquisition has made little difference. "It doesn't mean a hill of beans to me that EMC is on the splash screen when I start up Data Domain," she said. "I've seen no difference in the quality of the product."
John Thomas, former senior enterprise systems manager at Richmond, Va.-based law firm Troutman Sanders and another early Data Domain customer, has a better relationship with EMC. He now works for EMC partner Sovereign Systems, selling backup solutions as Sovereign's practice manager for backup and recovery.
But, in 2006, he was still at Troutman Sanders and looking to get the international firm off tape. A sales rep he had worked with on another product moved to Data Domain and prompted Thomas to take a look. He set up a test unit in his firm's Hong Kong office and replicated data back to an Atlanta data center.
"It was a very intriguing solution at the time, so I did my research and we purchased on a conditional PO," he said. "It worked very well and the rest is history.
That conditional order turned into a three-month pilot. At the end, Troutman Sanders purchased 25 Data Domain appliances for its Atlanta and Richmond data centers and 18 remote sites. It replicated data from remote sites to the data centers and did the same between the two data centers.
"It was cutting edge at the time," Thomas said. "It made the cost of storing large volumes of data on disk cost effective versus the traditional disk model. It also deduped data so it was viable to replicate off site, which was the key to our strategy.
"It led to the complete elimination of tape for us, and going tapeless in 2006 was a big thing. It took away the risk of us having to ship tapes, maybe losing data and potentially having to notify clients. That was not a risk we wanted to take."
Thomas left Troutman Sanders in April 2010, but the law firm still has Data Domain appliances.
Not all dedupe customers are using Data Domain, and they certainly don't all go back as far as Labcyte and Troutman Sanders. But the driving factor is the same for more recent converts.
Patrick Yonke, senior systems administrator of Kish Health systems in Illinois, said his two-hospital company installed HP StoreOnce 4324 in early 2012 as part of a swap out of EMC SAN gear for HP storage to save money. Before moving to HP, Yonke said he was still backing up to tape.
"We didn't want to deal with tape anymore, that's old technology," he said.
Kish replicates data between StoreOnce systems at its main data center and at the DR site. Yonke said he gets about an 11.3-1 dedupe ratio on average, but that includes some applications that are already compressed before backups. He said he backs up much more data now and does it faster and more reliably with dedupe technology.
He said that, with disk and dedupe, "Now I know we're backing everything up. There's no question that every server we have is backed up 100 percent. And, the other side is, restores are now lightning fast. Before, when a restore request came in, we would have to go into software, figure out what tape that data was stored on and send a request to have the tape returned. It's another day when the tape gets here, if it's the right tape. With D2D, I restore a file by clicking restore and the file is back in less than five minutes."