Two of Quantum Corp.'s DXi data deduplication virtual tape library (VTL) customers said that their systems have taken hours and sometimes days to recover after an accidental
The administrators said after DXi appliances accidentally had power interrupted, a process Quantum calls "blockpool verify" was kicked off. This process verifies integrity of all blocks, and is only invoked if there is an "unclean system shutdown."
Ben Barnes, an IT infrastructure manager who asked that his company name be withheld, deployed Quantum's DXi5500 Fibre Channel VTL with data deduplication approximately nine months ago. The mutual fund investment firm has 2.6 TB of data it backs up weekly, but went with a Fibre Channel VTL because of its integration with the company's tape infrastructure.
Barnes' company has one DXi5500 box at either end of the wire between its main data center and a secondary disaster recovery (DR) site, but brought both boxes to headquarters first to set up and test them. During the move of the secondary system to the disaster recovery site, there was a power failure, and when the machine resumed, Barnes said, "it took days to do an integrity check on all the data before we could use it again."
Following a firmware upgrade, a more recent power loss took the company's DXi5500 between four hours and five hours to recover, Barnes said. That's an improvement, but not exactly sufficient.
"The second incident showed huge improvement, but we would've hoped for some better improvement overall," Barnes said. "In a disaster when you need the system up and running, the last thing you want to do is wait half a day to be able to start restoring."
Steve Stoutner, manager of IT for a bank processing service provider in the southern U.S. who requested his company not be identified, told a similar tale. When Hurricane Ike hit Texas last September, a location in Houston failed over to a secondary DR site. Then the generator at that secondary data center went bad, and all systems went down.
Already, there had been some data loss, because the DXi3500 array at the main location was replicating once a day after post-process deduplication finished. Then the company had to wait through the integrity check before it could verify what data was still intact.
"Quantum's answer was, 'you shouldn't power off the system like that,'" Stoutner said. "We weren't able to bring up a lot of systems we wanted to bring up."
Luckily, the service level-agreement (SLA) with clients was long enough that the missing data wasn't a huge problem. But the incident still sent Stoutner's company looking for another dedupe device. When the company brought in Data Domain's DD565 data deduplication array, one of the first things his team did was pull the power cords out. "It came right back up," Stoutner said.
Data Domain claims that its integrity-check processes are engineered to withstand the kind of failure Stoutner and Barnes experienced. Ed Reidenbach, Data Domain's director of product management, said Data Domain's operating system verifies the consistency of new file system metadata and integrity of the data after the backup completes. It then continuously rechecks all data online in the background.
"During the initial end-to-end verification process strong checksums are calculated on receipt of data by the DD OS, after which data is stored to a battery-backed NVRAM and then to disk," Reidenbach said. "In case of a power failure, recovery of writes that have not gone to disk requires simply playing back the requests and checking against and overcoming the power-fail weakness of the ATA cache. No metadata updates are committed to disk until all data is already committed, allowing no consistency confusion in case of NVRAM failure. Therefore, the replay process can be very fast."
Last year, IT staff at the Anchorage, Alaska school district told SearchDataBackup.com of problems that also caused data loss with a Quantum DXi 5500. The DXi 3500 and 5500 are Quantum's midrange dedupe systems. The vendor also has a newer enterprise platform, the DXi 7500.
A Quantum spokesperson said the company is aware of the issues with the "blockpool verify" process. Quantum's device also performs continuous high-level data integrity checks it calls "blockpool health checks" that can generally be done quickly. "Blockpool verify … is … only invoked if there is an unclean system shutdown. This is the situation Ben Barnes...was referencing, and it is a much less frequent occurrence," a Quantum spokesperson wrote in an email to SearchDataBackup.com. "We are continuing to make improvements in this area, but it currently can take a few hours the first time based on the size of the blockpool."
Barnes said he's worked with Quantum engineers on the issue and was willing to stick with it, partly because of the improvement after the firmware upgrade. Barnes had also evaluated Data Domain before purchasing the Quantum device last year, but wanted a disk system that supported his tape backups.
"In the end, when we looked at what we were quoted by Quantum, it was all-inclusive -- systems, tape, licensing and support, all in one price," he said. "Data Domain was not a one-stop shop."