Data deduplication is becoming an increasingly popular option for SMBs because it saves time and money out of the box.
Dedupe reduces the amount of data to be stored by identifying repetitive data and replacing it with a small pointer to the previously stored copy. The cost savings result from using less storage capacity, which allows the organization to delay purchasing additional storage capacity.
"Disk-to-disk [D2D] and backup-to-disk storage remain the sweet spots for dedupe," said Greg Schulz, senior analyst, StorageIO Group, Stillwater, MN. D2D usually focuses on disk-to-disk replication and snapshots; backup to disk usually involves backup software for conventional backup purposes. With data deduplication, an organization can keep more days of backups in a given amount of storage space.
But dedupe doesn't just focus on conventional data backup. Transplace LP, a shipping and logistics provider, virtualized almost 95% of its servers, including 100 production servers. Transplace also dedupes its virtual servers to eliminate unnecessary copies of the OS and related DLLs. "We can eliminate 20 GB from every virtual server," said Vincent Biddlecombe, CTO. The company uses dedupe built into its NetApp Inc. storage arrays. With hundreds of virtual servers, he calculates freeing up more than 3 TB of space.
Although backup is the primary use for data deduplication, it can also be deployed as part of a remote-office environment to reduce the volume of data sent back to the central data center. Vendors like NetApp, Ocarina Networks and Storwize have introduced products that can deduplicate or compress non-backup data, but to date they've mostly been used for lower tiers or nearline storage. Deduplication is not yet being deployed on primary storage, but vendors say they are working on it.
Almost every storage vendor is bringing out deduplication offerings or incorporating it into their existing products. These often take the form of appliances, but dedupe is increasingly being added to array firmware or backup software. Hewlett-Packard (HP) Co., for example, built dedupe into its StorageWorks D2D and virtual tape library (VTL) products. CA is adding dedupe to its ARCserve backup software.
Moving from tape to dedupe
Before turning to data dedupe with HP's D2D backup, the Eaton School District in Colorado could save a week of backups to disk. "After seven days we would have to write over," said John Baker, technology director for the district. That limited the organization's ability to quickly retrieve data from older backups. With dedupe and some additional disk capacity, the K-12 school system can now keep a year's worth of backup on disk.
Intent on eliminating tape altogether, Baker initially opted for HP for its attractive price. However, he discovered the appliance had impressive document restore capabilities. "We can find a document fast and can restore a file to its original location in about 30 seconds," he reported.
A data deduplication market leader, especially among midsized organizations, is Data Domain Inc. "The thing with Data Domain is that it will attach to any vendor's storage," said Schulz.
Roger Williams Medical Center, a 220-bed hospital in Providence, R.I., put one Data Domain appliance in its data center and another at a backup facility. The hospital uses the appliance for both data deduplication and for remote replication, reported Andy Fuss, lead technical engineer.
Previously, the hospital relied on conventional backup to tape and moved the tapes offsite. A year ago the hospital realized tape was no longer feasible. "Taking 26 hours for an incremental backup isn't acceptable," said Fuss. After looking at Avamar (now EMC Corp.) and ExaGrid Systems Inc., the hospital opted for Data Domain. Since installing it, the appliance has run for 380 consecutive days without downtime or rebooting.
"I don't want to buy more hardware," said Edward Eades, a senior system integrator at a midsized financial services firm. Yet most deduplication vendors require adding another device.
Eades uses CA ARCserve to back up data stored on its Compellent array. Rather than add another device, he is beta-testing the new dedupe capabilities being added to ARCserve. "It lets us go at light speed," said Eades.
Data deduplication isn't free, unless it's bundled with backup software. Dedupe appliances typically offer a range of additional features, such as replication, and sometimes disk. Prices don't start much below $40,000 for appliances, noted Schulz.
For midsized companies, data dedupe options are growing. Expect to see dedupe built into more storage and networking products.About this author: Alan Radding is a frequent contributor to SearchSMBStorage.com.
Do you have comments on this article? Let us know. Please let others know how useful this tip was via the rating scale below.
Do you know a helpful storage tip, timesaver or workaround? Email the editors to talk about writing for SearchSMBStorage.com.