Users evaluating data deduplication products in an already crowded marketplace will now have yet another basis for comparing products: not just how different vendors perform data deduplication, but how they also incorporate data compression as a means of data reduction.
This week, virtual tape library (VTL) vendor Sepaton Inc. joined competitor FalconStor Software Inc. in announcing a partnership with Hifn Inc., maker of hardware-based compression chips. (FalconStor announced the partnership last fall.) And the numbers being thrown around by Sepaton -- 50-to-1 data reduction without a significant performance hit when both data deduplication and Hifn's data compression are turned on in its S2100-ES2 500 Series VTLs -- show the potential impact of a relatively small system component.
The terms "deduplication" and "compression" can get confusing, as both processes perform data reduction, and both do so by eliminating redundant bits. However, data deduplication does comparisons to previously stored data; compression eliminates patterns within one file.
Sepaton claims that once data is deduplicated at the block level at what it said is a typical reduction ratio, 25 to 1, that data can then be compressed using the Hifn chip to cut its storage capacity requirement in half again, boosting the typical data reduction ratio to 50 to 1.
Sepaton doesn't have many publicly announced customers to back up these claims yet, but Linda Mentzer, vice president of marketing for Sepaton, said the 25-to-1 ratio is an average taken from testing at between 15 and 20 customer sites using "typical" data sets -- not repeated full backups in a lab, which can inflate dedupe numbers.
"In some environments using NetBackup with files, Exchange data and SQL data, we've seen as much as a 56-to-1 deduplication ratio, which would make the ratio with compression 100 to 1." But, she said, the company chose to go with the "most typical" number.
It's a number analysts said is reasonable given the data at hand, which unfortunately isn't much. According to Arun Taneja, founder and consulting analyst with the Taneja Group, it's because of the way Sepaton does data deduplication; It removes individual files from tape archive "wrappers" made by backup applications and then deduplicates them according to preset "awareness" of how much deduplication is possible within each file type, a process it calls "content-aware deduplication."
"Technically, they could squeeze out duplication there at the byte level," Taneja said, and a 25-to-1 ratio is in line with what most data deduplication vendors on the market claim. A 2-to-1 compression ratio for LZ compression, which has been around for years, is a generally accepted figure in the industry.
However, Taneja pointed out, for all the squabbling and infighting this space has seen so far this year, there has yet to be a definitive bake-off between products performed by a third party.
"They're all claiming around 20-to-1 or 25-to-1 ratios, but it's very muddy how they arrive at the numbers from one company to another," Taneja said. "Nobody has hard data on this."
According to W. Curtis Preston, vice president of data protection services for GlassHouse Technologies Inc., the bottom line is that users should test products carefully using an accurate sampling of the data in their environment before buying. "Each of these systems, because they have different approaches to deduplication, will consistently do better with different types of data," he said. "Don't believe any of these numbers on their face -- it's not that vendors are lying, but it's like miles-per-gallon figures for cars. Your mileage will vary based on driving conditions."
Sepaton is now shipping the Hifn chip with all 500 Series VTLs, regardless of whether or not it has the company's DeltaStor data deduplication option. Users can choose to turn on the data compression feature with a software license that costs $16,000. It seems a hefty price for a feature that's standard on most tape drives, but Metzer pointed out, "Replacement disk trays can cost between $35,000 and $40,000 -- avoiding having to buy it by turning on data compression is the more attractive option for most users."
While FalconStor and Sepaton are partnering for hardware-based compression with Hifn, Diligent Technologies Corp. is sticking to a proprietary software module to perform compression. According to chief technology officer Neville Yates, this is because Diligent is an inline process that writes only changes to disk, meaning the performance benefit it would realize from compressing data for throughput would be negligible.
"If I operate at 400 megabytes per second (MBps) and only 40 MBps of that is data I'm going to compress, hardware-assisted compression in that instance [would be] overkill," Yates said.
Another competitor, Data Domain, is also sticking with software. "If you do fast dedupe inline and compress as a final step for only unique sequences, the CPU impact of local compression is nominal," wrote Beth White, vice president of marketing, in an email to SearchStorage.com. "Most vendors don't have this capability. Throwing hardware at it might be all they can do."