While searching for a virtual tape library (VTL) to reduce its reliance on tape, the U.S. Army discovered data deduplication as a useful weapon to accomplish that ,and then some.
Bob Dixon, chief architect at Army headquarters at the Pentagon, said he hasn't been able to scrap tape completely but found he could do more than just backups with Data Domain Inc. deduplication appliances.
"Our goal was to replace tape with disk-based storage," he said. "We did a head-to-head evaluation of VTLs. Deduplication and replication was a bonus."
"Since 9/11, we've consolidated almost all of our services into consolidated server rooms," he said. "We've collapsed most of our SANs and servers into a few SAN islands."
The Army Information Management Support Center's storage includes about 150 TB to 200 TB of Tier 1 storage on EMC Corp. Symmetrix and high-end Hitachi Data Systems (HDS) disk arrays purchased through Hewlett-Packard Co. (HP). He has a second tier of disk arrays and eight HP SDLT tape libraries.
"We look like any other large IT organization – we handle the user support, desktop support and information that people manage," Dixon said. "Information looks the same as any other organization with 10,000 to 12,000 users."
Dixon said his group looked for a VTL as part of a move to replace tape and improve backups. Dixon wasn't looking specifically for data deduplication. Dixon also considered VTLs from EMC, HP and Advanced Digital Information Corp. (ADIC), now part of Quantum.
The idea was to consolidate backups on fewer devices to reduce backup times along with the footprint of the libraries, but the Army found another use for the technology when he evaluated Data Domain early this year.
"We looked at Data Domain, they said they could give us a 20-to-1 compression ratio. We said, 'Yeah sure, show us.' Sure enough, we got a 23-to-1 compression ratio. That modified the way we were planning on doing business. Initially, we wanted to replace tape, but deduplication has other capabilities. We also use it to do our replication to remote sites."
The Army purchased Data Domain DD560 arrays – Dixon declined to say how many appliances or replication sites he uses. He did say he stores about six months of backups on the Data Domain boxes but still uses tape for archives. Like other large IT organizations, the Army likes the security of keeping older less frequently accessed data on tape.
"We'll never get rid of all of our tape, unfortunately," he said. "We've replaced 80% of it."
Still, Dixon accomplished his goal of reducing his reliance on tape. "The big thing for us was the amount of data you could put on the box," he said. "We took an eval unit and got 73 TB in 3U worth of rack space. That was the biggest benefit for us, the capacity of the system."
Dixon figures the return on investment (ROI) will take between 18 months and two years from the time of deployment. "We can demonstrate it's saving us money, because we don't have the cost of managing tapes, storing them and transporting them," he said. "That's primarily a labor cost. Then it adds capability that we didn't have before in the form of replication. Finally, it adds speed to our backup. We were using SDLT tapes, they were fairly fast but not the same as backing up to disk. And restore speed is what really counts. We also retired a lot of equipment -- tape libraries and maintenance were enormously expensive."
Dixon's biggest complaint is that he can't use his Data Domain systems for more than backup. "It would be nice if you could put it in front of our EMC or Hitachi storage system and multiply the capacity of those," he said. "I'd like to see them expand deduplication to existing storage systems, make it look like a server on the network with lots of disk space so users can write to it as opposed to being a backup device."
He may get his wish eventually, Data Domain vice president of product marketing Brian Biles said. But while Data Domain has NAS heads that support files, data deduplicating block storage will take a while. "That's something we'll be working incrementally too," Biles said. "With our nearline systems, we support a lot of applications that are nonprimary but not backup applications. We can do files. But primary storage with high-transactional IOPS can be difficult."