Ten years after Data Domain began selling backup appliances performing data deduplication, the product line has helped EMC dominate the backup hardware market. According to IDC, EMC owned a 66% market share and $572 million in revenue in the fourth quarter of 2012 from backup appliances -- consisting mostly of Data Domain's target dedupe with some Avamar source-based dedupe. We spoke with Guy Churchward, who became president of the EMC backup and recovery systems (BRS) division last October, about the future of Data Domain and EMC's long-term plans for data dedupe.
There is plenty of competition out there today, so why has Data Domain been able to stay on top of the dedupe market by such a large margin?
Guy Churchward: The market has gotten busier, but I don't think there's a single product out there that has a comparable offering. If you have a generic use case for VTLs [virtual tape libraries] and want something good enough at small scale, there are plenty of opportunities for you. We've moved up to a much larger scale -- the DD990 is knocking on the door of three-quarters of a petabyte. We have a data invulnerability architecture -- a method of checking on the integrity of data -- and nobody else has that. As a cheap and good-enough tape replacement, there's plenty of stuff out there. But as a protection storage architecture, nobody is heading down the same route with level of innovation.
Where do you expect to take the platform in future releases?
Churchward: Sometimes the worst position you can be in is having [the] market dominance we're in. You get complacent. I'll continue to talk to our engineering team about not being Nokia when the iPhone came out. You need to continue to look at the market and understand what the disruption forces are, and work hard to be part of that and add value to customers. Customers want highly distributed scale, and reduced cost and complexity. Those are the areas we have to keep chipping away at. We want to make sure it's easier to use the systems. Not just a single box, but multiple boxes in multiple locations, and a hybrid environment where you're on premises and in the cloud. As long as we have that level of paranoia and we're not complacent, we'll carry on with the market space dominance we have.
Adding to that, we're developing a much closer affinity to our brethren on the primary storage side. There is a school of thought saying, 'Well, why is there such a thing as backup? If you can have a tapeless backup, why can't you have a backup-less backup?' It would be interesting if we could split directly from a VMAX [enterprise storage array] into a Data Domain system, and not have to worry about it. Similar to what we did with DD Boost, and what VMware is doing with its VDP [vSphere Data Protection] product. That's using Avamar, so in essence, a VMware administrator can do backups directly from VMware. He's not touching anything that feels or smells like a backup application.
On the deduplication side, we're working on not just deduplicating blocks, but things like deduplicating images. Is there a way of taking images and figuring out if you can do the same thing we've done the last 10 years but on an image-based side as well? So, right across the board we have projects we're working on to make customers' lives interesting.
We're also looking at different use types -- archiving is a good example. And the DD Boost-type scenario is more prolific where we're taking dedupe out of just inline and bringing it to end points where you can reduce the bandwidth and improve system performance.
Will we see any fundamental changes to the deduplication algorithm?
Churchward: It will carry on doing what it's doing. We need to move dedupe to a distributed model as well. One challenge around inline dedupe is the more streams you have coming in, the harder it is to perform these functions. The bigger the box we have and the more scale we have on the systems, the more we have to innovate around the way in which we do deduplication.
Even simple things like moving into 3 TB drives, the amount of IOPS you have to play with gets reduced. When it goes from 2 TB to 3 TB, it takes one-third of IOPS out. If you try to do garbage collection across nearly a petabyte system, you really have to innovate quite sufficiently around the way the operating system works, the way the dedupe works, the way its garbage collection works, and the way you do memory allocation. So that's the type of [innovation] our engineers are working on.
Dedupe is considered an important feature for flash storage on the primary side. Do you see flash playing a big role in EMC backup?
Churchward: We have projects we're working on around full flash arrays to see if it works in our favor or not. You have this magic equation -- if you bring flash in, you bring high cost in. Does it add enough value to be a critical part of your platform?
I do think flash is going to play a part in backup, and I think flash is going to get much closer to primary as well. That's another initiative we have moving forward.
Can Data Domain algorithms be used for primary dedupe?
Churchward: As far as primary dedupe goes, that would be a different team [at EMC]. We do work closely with other primary parts, like the VNX team and VMAX team and the Isilon team. We add value as much as we can. Part of the premise of a deduplicated appliance is the ubiquity you provide with it.
We have Data Domain on Vblock [integrated stack] and Vspex [reference architecture]. You'll see more tight cohesion with VMAX and VNX with Data Domain. You'll buy protection storage with components that will work as one with the primary storage.
Dedupe also would play a big role in moving backup data to the cloud. Do you see cloud backup playing a big role in the enterprise anytime soon?
Churchward: I think it will, but I don't think it's going to happen over the next three years or so. What we're seeing more is the hybrid approach -- imagine a lightweight Data Domain box sitting on a customer site, where it collects it, deduplicates it, keeps a synthetic full and then pushes off deeper archive versions into the cloud. So, [customers] still have control point and fast recovery, but they're using the cloud as tape replacement. So, I can see that coming pretty quick. The idea of a customer saying, 'I don't want to run my own data center, it's going wholesale into the cloud,' I think that's a fair ways off.
Now, saying that, we have brought Mozy into the portfolio of BRS. Mozy does backup as a service, and that's definitely moving upstream. We're doing a lot of work integrating Mozy tighter with Data Domain and doing dedupe on the back end. It will get there, it's a question of how long it will take. From the core market space we sell into, I think recovery time objectives and security objectives are such that they still need a dedicated backup appliance on the customer site. Restores are still too long from the cloud.