News Stay informed about the latest enterprise technology news and product updates.

Source deduplication decreases backup data, bandwidth needs for remote offices

Source deduplication works particularly well in reducing backup data for remote offices/branch offices (ROBOs) and laptops.

The best way to tell that data deduplication has come into its own is the variety of flavors now available. Source deduplication -- also called client-side dedupe -- skips deduping data at the media server or appliance level in favor of deduping at the backup client level. This approach works particularly well with remote offices or branch offices (ROBOs) and laptops.

"If it wasn't for dedupe at the client, before anything else happens, the solution wouldn't work," said Gregory Fait, associate principal and director of IT infrastructure at architecture firm Perkins & Will. The firm's decentralized management setup and remote offices all over North America led them to source deduplication with EMC Corp.'s Avamar as part of a tape replacement project. "When you're talking about two to three terabytes at a remote site and looking at our bandwidth and the pipes we had," said Fait, "there's no choice but to do it locally before it went over the wire."

More on data deduplication
Inline deduplication vs. post-processing: Data dedupe best practices

Choosing data deduplication products: Hardware and software offerings

Data deduplication technology tutorial: A guide to data deduping and backup

What is source deduplication?

Source deduplication "is a natural evolution," said Lauren Whitehouse, senior analyst at Milford, MA-based Enterprise Strategy Group. "As people get comfortable with the technology and the technology improves, it's not having as much impact on production environments as everyone thinks," she said. Source deduplication is all about efficiency. "The closer you are to the source of the data," said Whitehouse, "the more efficient you're going to be in moving data around."

Easing remote backup with source deduplication 

Network engineer Andrew Harkin was also looking to ease remote backups with Avamar at Avera McKennan Health, a healthcare organization with small clinics and hospitals across the upper Midwest, comprising 56 remote servers at 21 sites. Harkin needed to cut down his data backup windows. "Traditional tape backups weren't working," said Harkin. Deduping data at the source has had "a huge impact on us," he said. "Overnight it went from 25 or 26 hours to two to three hours to get them done."

Software-based data backup and recovery vendors are in the driver's seat in the source dedupe market, said Whitehouse: "The target-side [dedupe] vendors don't have the same opportunity as the software vendors to capture data at the source." Vendors like Asigra and Robobak incorporate source dedupe into their backup software. Storage-centered vendors are entering the source dedupe market; Whitehouse names IBM Corp. and EMC as in the best position to follow Symantec Corp. in offering dedupe in many places throughout the backup process. Symantec recently announced source deduplication with PureDisk in its newest Backup Exec 2010 and NetBackup 7 releases, adding to previously available media server and appliance options (using the OST plug-in technology).

Source dedupe for laptops and desktops is an emerging area. "We see the demand for dedupe in the data center and remote offices because there is so much duplicated information," said Mathew Lodge, senior director of product marketing at Symantec. Symantec isn't supporting laptop dedupe yet, he said. "It's more a question of timing. Laptops and desktops are interesting, but it's not the most urgent problem."

There's a high level of duplication among desktop and laptop data, said Rob Emsley, senior director, product marketing, EMC Backup Recovery Systems division. "Even more than remote offices, the amount of duplicate data that exists within those computers in the same firm is very high," said Emsley.

Though source dedupe can eliminate bandwidth headaches with regular backups, there's still the matter of that first full backup to consider. "The first one is the first one, no way around it," said Ron Roberts, president and CEO of Robobak. Most vendors recommend "seeding" to establish the first backup and avoid a huge bandwidth hit. Seeding might entail backing up files common to each user, such as the Windows operating system, which everyone then dedupes against. "Before the rollout, do a backup of the typical applications your company has," said Eran Farajun, executive vice president at Asigra. "Back up files in the data center, and now users won't have to re-backup files."

Emsley said that EMC users might seed the system by moving the Avamar backup server to the remote location, doing the initial backup of all machines at that location, then moving the server to the central data center where it will reside. Or "simply deploy the backup agent in that remote office and back up over your network connection," he said. "There are various ways of deploying it."

Overnight it went from 25 or 26 hours to two to three hours to get them done.
Andrew Harkin
network engineerAvera McKennan Health

Perkins & Will's Fait decided to use large external USB drives to seed their remote offices with a copy of the data. "It cut roughly two to three weeks" off the setup process, Fait said. Avera McKennan's Harkin used a virtual machine to seed his system. "I built a virtual machine off a template, a very basic system with nothing on it, and I used that for the first one," he said. "After seeing it, I could have used a production server with no real problem."

But not everyone wants to use the processing power of a production server and backup application agent for dedupe. "We deduplicate at the server level for our systems," said Al Schipani, manager of server engineering at Westchester Medical Center in Valhalla, NY. He and his team have been beta-testing Symantec's NetBackup 7, and they still prefer media server dedupe for their data, especially for virtual machines. "Since we are 24/7 we do not want to add the additional workload of deduplication on the source," he said.

Fait said next on the dedupe wish list said is laptop backups, as well as some cloud-based options -- "a kind of end user, self-service type of backup and restore," he said, "whether inside a corporate network or public Internet."

Harkin wants to see more integration from EMC across its backup offerings (which have come together through acquisitions). He'd also like user interfaces made a priority. "I think as an industry, people are used to ugly interfaces and commands," said Harkin. "In a real-life scenario, you don't have enough time to get things done, and having a nice, well-designed interface is paramount to getting things done."

Christine Cignoli is a Boston-based technology writer. Visit her at

Dig Deeper on Data reduction and deduplication

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.