Data Domain's president and CEO Frank Slootman has had an eventful year. His company went public on June 28, and as part of reporting to Wall Street as a newly public company, Data Domain disclosed that it had 973 unique customers as of the second quarter, and now says it has surpassed 1000 customers. Slootman sat down with SearchStorage.com to discuss where data deduplication has been and where it's going.
SearchStorage.com: What segments of the market do your customers come from? Do they test your system against others in the market before buying?
Slootman: We have a lot of track record, so it's not hard for people to find out if our claims are true or false. We always say to customers, you can do one of two things or both: we can give you six local references in your vertical, or we can let you bring the systems in house and you can run your workload on it in your environment. About a third of our business is based on references, another third is after evaluation, and another third is repeat business.
We tend to segment the market by size of data. 500 terabytes (TB) and above, in consolidated data centers, we view that as a raw scale throughput market that's dominated by virtual tape and physical tape, and we refer to it as the "VTL market"—EMC is the principal player there, and IBM and Sun to some extent. Below a terabyte is what we refer to as a remote office market, and they're backing up straight to the network, and you can do that because the data is so small, from both a backup and a restore viewpoint. That's where you see companies like Avamar and [Symantec Corp.'s] PureDisk, and in the larger data sets you see companies like Riverbed that are caching the data as it's sent over the network. North of 1 TB and up to hundreds of terabytes is Data Domain country -- tape is particularly troublesome in that segment.
SearchStorage.com: In the VTL space you refer to, none of the companies you mentioned have offered dedupe yet in their hardware. They say it's because of concerns about performance and data integrity. What's your response to that?
Slootman: We have 185 petabytes protected and 3000 systems out there, and we also just went public. You can't go public without extreme scrutiny -- if we had data integrity issues, we couldn't have done any of that.
As for performance, we have a single-stream transfer rate of 150 megabytes per second -- that's the fastest single-stream performance in the industry.
SearchStorage.com: It's still not as fast as a lot of the tape drives currently on the market.
Slootman: We're not really competing with tape in the market we focus on. In the VTL market, you do end up competing with tape, but the operational pain of tape in the midmarket is higher than at the high end. They've already decided that tape is coming out. At the high end, they can have the critical mass in systems and procedures, Iron Mountain pickups, all that stuff.
SearchStorage.com: So what's the future for Data Domain in particular, and the dedupe market in general?
Slootman: We don't view deduplication as a backup technology. We view it as a storage technology -- backup is a subset of storage. We see the technology apply to storage broadly.
We've figured out how to do versioning and replication at a level of efficiency that wasn't in the market prior to Data Domain. Before Data Domain, it was the NetApp model based on block-level incremental [snapshots], which was very efficient at the time it was new. Today, Data Domain is backing up NetApp filers, we're backing up snapshots, and we can store five to seven times as many snapshots as can be stored on NetApp filers. Versioning and replication is our bread and butter -- in NAS terms, that's snapshots and replication. We view the world moving from protection storage to what we call self-protecting storage, general purpose storage that doesn't require a separate backup infrastructure, and has versioning and replication built into its architecture.
We've always had this endgame in mind to generalize our storage -- that's why we're associated with an element of the infrastructure, rather than with a particular application like backup software.
SearchStorage: So you're talking about getting into primary storage?
Slootman: We call it nearline storage, everything that's not primary. We view primary as being super high-end, very high velocity data, very transactional -- that's not where this gets positioned. The data that does not change and turn over as often, we view as nearline storage, and we think our storage is going to be suitable for that -- for reference, for archive, for litigation support. We have a customer that's using our system to archive [software] builds -- they do daily builds, and they're big, but they're all just 0.1% different from the previous build. They can't go to tape, because they actually have to read those things back when they have a customer they need to debug with. They get fantastic compression on an application like that.
In our own engineering environment, we run our own home directories on our boxes in our building. We have 200 million files on our storage, and we're getting 5 to 6x data reduction on home directories without trying very hard.
SearchStorage: Doesn't moving up the chain to nearline storage from backup raise performance issues again?
Slootman: Backup is the most punishing, torturous throughput case you can imagine. There's massive data to be moved in very short periods of time. Whereas moving data from one source to another on nearline systems typically isn't under the time constraint, and the latency on restore isn't an issue because you're restoring one or two files, not terabytes of data like we do in the backup and restore area. So actually, performance-wise, the requirements are much less than what we experience in backup.
SearchStorage: Some people think dedupe will be part of everybody's systems eventually, like encryption is becoming part of tape drives. What do you think about that outlook?
Slootman: A lot of people are saying it's a technology that's very simple and that everybody will have it. There are significant engineering and technology challenges in making sure the technology performs at the rate that you need it to, that it gets the data reduction results you hope it will deliver, and that it delivers the data integrity and reliability you need. It's very difficult to harden a storage system so that it has the resiliency that you need to remain operational through a drive failure or a power failure. When you delete data, these products go through a garbage collection process that brings most systems to their knees so badly that they become unusable. Same with rebuilding a drive, or they'll corrupt data during a power failure -- there are so many corner cases that you need to deal with, and people really don't understand the complexity of this technology at all.
Twelve months ago people were calling me up saying this'll be a feature in everybody's product. These days people are calling and saying, 'we're finding out this is harder than it looks.'