News Stay informed about the latest enterprise technology news and product updates.

EMC Data Domain unveils Global Deduplication Array

Data Domain's first release of global data deduplication, a two-node cluster of DD880s, will require Symantec's NetBackup OST, with NetWorker integration to follow.

EMC Data Domain today said it will ship a Global Deduplication Array (GDA) later this quarter, which allows for data deduplication across two DD880 disk array controllers.

The Global Deduplication Array is comprised of two DD880 arrays updated with additional disk shelves. Each DD880 array can now hold 142.5 usable TB, up from 71 TB previously. That means the Global Deduplication Array will scale to 280 usable TB, up to 3.5 GBps throughput, and support up to 270 concurrent write streams.

The GDA requires Symantec Corp.'s OpenStorage (OST) API and either Symantec NetBackup or Backup Exec to control the placement of data across multiple controllers.

More on global deduplication
Why global deduplication is important in choosing a data dedupe product

Data Domain delivers the DD880: A bigger, faster data deduplication device

Global data deduplication can simplify administration of multiple deduplication devices

Data deduplication and backup approaches in enterprise storage today
"The OST plugin scans data before it's sent to the back end to assess which controller will get the best data deduplication ratio," said Brian Biles, vice president of product management of EMC's Backup and Recovery Systems (BRS) Division.

The relationship between Symantec and Data Domain goes back to before EMC acquired Data Domain last July for $2.1 billion. Support for global deduplication with EMC NetWorker backup software is planned for later this year, Biles said.

Data Domain is also rolling out a new operating system. DDOS 4.8 will add encryption for data at rest (post-deduplication), support for one-to-many replication among DD devices (previously many-to-one and cascading replication were supported), and the addition of a delta differencing algorithm to the DD Replicator data replication that adds further data reduction (EMC claims up to 2 times) for low-bandwidth connections.

EMC NetWorker integration to follow

EMC BRS president Frank Slootman said earlier this year that EMC Data Domain would add an OST-like integration with NetWorker, but Biles said that integration is still a ways off.

"Until the Data Domain acquisition, EMC didn't really have a program like OST, and we're catching them up," Biles said.

Biles also said there are no plans to add global data deduplication to its smaller arrays. "We decided to start with something enterprise-ready and very controllable," he said. As for using OST to cluster more than two controllers, Biles said "in the first release, we wanted to do something very complete and optimized."

This excludes customers of other data backup products such as CommVault Simpana, Hewlett-Packard (HP) Co. Data Protector or IBM Corp.'s Tivoli Storage Manager (TSM) from using GDA, but Biles pointed out that NetBackup has the lion's share of the enterprise backup market. "Once we integrate NetWorker, that covers more than 50% of enterprise backup customers," he said.

A question of demand downmarket

While Data Domain is the acknowledged market leader in data deduplication – a status reflected in the bidding war between EMC and NetApp over Data Domain last summer – its competitors have pointed to lack of global dedupe as its major shortcoming. Exagrid EX Series, HP VLS, IBM ProtecTIER, NEC HydraStor and Sepaton DeltaStor are hardware target devices with data dedupe, and software such as Asigra Cloud Backup, CommVault Simpana, EMC Avamar, FalconStor FDS, and Symantec NetBackup PureDisk all perform global dedupe.

Biles said Data Domain is seeing the most demand for global deduplication at the high end of the market, and Global Deduplication Array's ability to scale up to multiple logical petabytes of capacity should meet the needs of even the biggest customer. "If there's demand on the low end, there's technology to enable that, but it's not our priority right now," he said.

However, another element of the appeal of global data deduplication for some customers is the ability to start small and grow big without having to migrate data or manage multiple separate disk arrays. For example, Orange County Sherriff Department backup and email administrator Douglas Blackburn, a DD DD560 user, told last year that he's thinking of upgrading to a DD690 gateway so he could choose his own storage and scaling strategy on the back end.

Another midmarket customer, MultiCare Health System, a group of hospitals and health clinics in Tacoma, Wash., told last year it went with Sepaton Inc.'s S2100-ES2 VTL after first trying Data Domain Inc. for backing up Windows data, because the Sepaton product scaled better with a global namespace. Other midmarket and small- to medium-sized business (SMB) competitors of Data Domain, including ExaGrid, offer global deduplication on scale-out NAS systems.

TechTarget executive editor and independent backup expert W. Curtis Preston said he's worked with customers who are tiring of managing multiple Data Domain boxes after outgrowing their first purchase.

"Some of them could've bought a DD880, but fully configured, it's a million-dollar box. So they buy a $100,000 box, they like it, it grows, and then what?" he said. The current choice in that situation is a forklift upgrade with DD arrays smaller than the DD690, and a head swap in the case of that array.

"OST could add just another layer of complexity to backup software, which is already pretty complex," said Beth Israel Deaconess Medical Center (BIDMC) storage architect Michael Passe. BIDMC is already creating its own global namespace that encompasses Data Domain's arrays using F5's ARX file virtualization switch, but Passe said he's still hoping to see Data Domain offer n-way native clusterability.

"We viewed OST as very proprietary," Passe said. "That's also why we deployed Data Domain as NAS rather than a VTL [virtual tape library], because we wanted to use native, well-understood protocols."

Still, Preston said "this is a good first step. I think this is a walk before you run thing. DD880s tend to be big enterprise customers that can deal with a few hiccups. Trying to release this across all their arrays might have led to a support nightmare on Day 1."

"Symantec integration with this new product ahead of internal EMC Networker stuff [is] a good sign that EMC is allowing the Data Domain guys to do what they need to do and not messing with their business model," added founder and analyst David Vellante in an email to

List pricing starts at $800,150 for a Global Deduplication Array with 23 TB of usable capacity, providing 46 TB to 1.1 PB logical capacity.

Dig Deeper on Data reduction and deduplication

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.