News Stay informed about the latest enterprise technology news and product updates.

Data dedupe technology helps curb virtual server sprawl

One of data dedupe technology's latest benefits is that it can help curb virtual server sprawl, although fine-tuning the actual processes may take more time, experts say.

Virtual servers and data deduplication technology have both kept the data storage industry buzzing in the past few years. But how virtualization and data dedupe work together is something vendors and users are still fine-tuning.

"We saw the tipping point last year when the number of virtual servers exceeded the number of physical servers," said Steve Scully, research manager, continuity, disaster recovery and storage orchestration at IDC. "The biggest challenge is around backup of those virtual machines."

"Virtualization has caused server sprawl," said Eric Pitcher, VP of technology strategy at CA. "People say virtual machines get thrown away, but the reality is it doesn't happen. Typically you just keep creating them." Data dedupe is a way to battle virtual server proliferation, he said.

More on data dedupe technology
Inline deduplication vs. post-processing: Data dedupe best practices

Data deduplication technology implementation approaches

Top five data dedupe technology tips

With a traditional, non-virtualized backup scheme, said Scully, a company buys a license for each server, runs the backup app on each server, backs up all files and sends to disk or tape. But when it comes to virtual servers, "if you do that times 50 or 100, you're paying a lot for those licenses and not getting the potential advantage of dedupe technologies," said Scully. "It's identical processes running without any knowledge of what the other guy is running." Virtual machines are often backed up as complete images as opposed to a set of individual files. Some backup apps can do dedupe across multiple VM images, said Scully. But "you don't get the granularity of file-level backup," he said. "You have to recover the entire virtual machine."

Virtual server backup more complicated than traditional backup and recovery

A common challenge of virtualized servers is that all machines are sharing physical CPU, bandwidth and disk, according to Rob Emsley, senior director of product marketing, EMC backup recovery systems division. "You have to make the physical resources more efficient, which becomes a challenge for doing traditional backup and recovery," he said.

Backing up virtual servers is more complicated than other backups, said Pitcher. "You take a snapshot of the server, move to a temporary location and do backup from that location," he said. CA's strategy for improving and deduping virtual backups included cutting out the temporary storage location to back up the virtual space directly from the virtual machine.

"Server virtualization and dedupe is the same concept: consolidation, optimizing storage, reducing power and cooling and retaining data for longer periods of time," said Mike DiMeglio, product marketing manager at FalconStor, which offers data dedupe with its DiskSafe and FileSafe product features. They currently use a proxy server to run source deduplication for virtual machines, supporting various backup software options, but DiMeglio said snapshots are a big part of FalconStor's roadmap for virtual machines and data dedupe. "Then you can back up from a snapshot, and dedupe applies to that," he said.

EMC's Avamar software dedupes at the source and from within the virtual environment with tight VMware integration, said Emsley. When deduping virtual machines with an appliance, using target dedupe, like EMC's Data Domain product, virtual servers are seen as just another workload, said Shane Jackson, senior director of product marketing, Data Domain and Disk Library at EMC. Dedupe rates can be very high for virtual machines because the level of redundancy is so high, said Jackson.

Target deduplication vs. source deduplication

There are advantages to both source deduplication and target dedupe with virtual machines, said IDC's Scully. One thing to consider is whether the backup application is doing the incremental backups on full backups of individual VMs. "I guarantee the image is going to look different every time," said Scully. "If some file on that entire image has changed, that entire image as a set of files is going to be different. So it isn't seen as an incremental because the whole file has changed." In that case, it might make sense to dedupe that entire image at the source. But others might want to get data out of the production environment without an extra load on the servers or analysis on the source side. In a true disaster recovery situation, image-level backups of VMs can be "very powerful" for getting systems up and running, said Scully.

Deduping backup data at the source can transfer data off VMs quickly, said Mathew Lodge, senior director of product marketing at Symantec. "It's important to move data off virtual servers," said Lodge, as virtual servers can consume a lot of CPU. Symantec's recently released new versions of NetBackup and Backup Exec do granular recovery of virtual machines, and allows for dedupe in several locations throughout the backup process, including at the source. Symantec recommends dedupe within each virtual machine if there are bandwidth constraints or in a data center using Microsoft HyperV. Otherwise, VMware users should use the vStorage API to send entire VMware images to a NetBackup media server for deduplication, said Lodge.

Other data dedupe options for virtual servers

Other creative options for deduping virtual servers are out there. At Bluelock LLC, a cloud computing provider, they've approached deduplicating virtual server data from a different angle, said Pat O'Day, chief technology officer. BlueLock uses VMware-linked clones to reduce duplicate data. They create a server in VMware as a template, put it in the cloud and let users provision servers from that template. When the user renames that server, just the one block changes.

It really comes down to understanding what your needs are, what you want to recover at the file level and what you want to recover at the image level,
Steve Scully
research manager, continuity, disaster recovery and storage orchestrationIDC

"The linked clone only tracks the differences between the machine the user spun up and the original template," said O'Day. "It's essentially dedupe." The downside is that in the long-term, "it never reconciles changes like a dedupe solution would," he said. Though O'Day said BlueLock is looking at dedupe options, they're hoping to incorporate both technologies. "I don't think linked clones are going to go away in favor of deduplication."

As virtual server use has moved beyond testing and development, the related backup and recovery continues to mature. "There's a lot to get done," said EMC's Jackson. "Even now we're pushing to a greater degree of server virtualization in the data center, getting to 80% virtualization." Data dedupe is letting companies get there, he said, as part of a greater "backup redesign."

When choosing how to dedupe virtual server data, "it really comes down to understanding what your needs are, what you want to recover at the file level and what you want to recover at the image level," said IDC's Scully. "There are various knobs and dials you can tune of what you what to recover and what level of granularity."

Christine Cignoli is a Boston-based technology writer. Visit her at

Dig Deeper on Data reduction and deduplication

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.