Data archiving reduces data backup workload prior to data deduplication

Data archiving has backup benefits, and reduces data backup data sets before they're ever sent to a data deduplication device. Many users insist that data archiving is the best for data storage environments, but others disagree, saying data deduplication is still an important element to have around.

In enterprise data storage, the theme of the year for storage managers is "do more with less," and some users are staying on top of data growth without breaking the bank by archiving their inactive data before it ever enters the data backup cycle.

Robert Stevenson, TheInfoPro (TIP) managing director of storage, said that so far in TIP's Wave 13 storage study survey being conducted this fall, there has been a "dramatic" shift in the way organizations address application and email archiving of data. "It's gone from 60% of respondents a year ago saying the storage team handles archiving policy, to about 20%," Stevenson said. Application teams, server teams and business units are taking over those processes.

Stevenson said the rate of growth on the archive tier of data is expected to exceed that of tier 1 and tier 2 by the end of the year, according to 144 survey respondents to the study so far. Respondents anticipate a 38% rate of growth in archive repositories compared with 31% for tier 2 and 26% for tier 1. This relates to the more aggressive tiered storage plans users have also been putting in place to cope with data growth on static budgets.

While archiving requires management from the producers of data and those responsible for regulatory compliance within an organization, storage managers at organizations with data archiving in place have found additional benefits, particularly in data backup, another place where data growth has challenged budgets and infrastructures this year.

Data archiving reduces data backup workload prior to data deduplication
Backup data reduction through data archiving
Some users turn to archiving appliances instead of data deduplication
Data deletion still an important issue; mileage may vary

Backup data reduction through data archiving

"I'd much rather archive than deduplicate," said Derek Kruger, IT and communications supervisor for the City of Safford, Ariz. Kruger currently uses GridBank software from Tarmin Technologies Inc. The city of Safford's data is stored on Dell EqualLogic PS5500E iSCSI storage with approximately 10 TB. For backups, Kruger uses Asempra Technologies' Business Continuity Server (now owned by BakBone Software Inc.). With 4 TB of total data in their environment, only about 1 TB of data is active, so Kruger prefers to archive his data.

Because the city uses Continuous data protection (CDP) for quick operational restores, retaining multiple copies of inactive files on disk isn't sustainable, Kruger said, and simply removing the inactive data from the backup set is much cheaper than buying a specialized Data deduplication appliance or software. "I can get a 2 TB hard drive for $180 and add it to my CDP or archiving servers," Kruger said. "Dedupe software is great, but pricey."

Some users turn to archiving appliances instead of data deduplication

Joe Funaro, director of technology for IT at a New York City medical consortium consisting of Lenox Hill Radiology, Diagnostic Radiology Associates and Park West Radiology, said data deduplication isn't an option in his environment because the most critical data consists of medical images that don't lend themselves well to algorithmic deduplication. The facility is also in the midst of planning a new cross-campus replication and disk-based data backup deployment. Therefore, tape backups are cumbersome for operational restores. "I can't keep backing up terabytes and terabytes and terabytes to tape," Funaro said. "It was becoming almost impossible -- the number of processes we were running during the evening window."

More on data deduplication
Data backup and recovery vendors dig into deduplication technology, aim for cloud backup

Using data deduplication with backup applications: Source vs. target dedupe

Dedupe dos and don'ts: Data deduplication technology best practices
Instead, some 150 million files consisting of images and other inactive data are archived to Nexsan's Assureon data archiving product, bringing backup times within the eight-hour evening window again.

Steve Davidek, operations and systems administrator for the City of Sparks, Nev., said he hasn't been able to get the budget to put in a new data deduplication system this year due to the economic downturn, but he has been able to remove about 3 TB from his backup data set by archiving using EMC Corp.'s ApplicationXtender. But he said he'll probably also look to add data deduplication once his budget allows him to. "We're also looking to deploy virtual desktops -- if I ever get money again, I'll be looking at a whole new way of doing backup."

A combination of data deduplication and archiving has worked out well for Nasser Mirzai, vice president of technology at San Mateo, Calif.-based TradeBeam Inc. In his 50 TB environment, data is stored on a Fibre Channel storage area network (SAN) first, then replicated to an iSCSI SAN using heterogeneous replication software for archival purposes, where it is retained for a year. Active data is backed up and retained on a Data Domain disk for one month, and also copied to tape, which is retained offsite for seven years.

Mirzai estimates between 5 TB and 6 TB of primary data are cut out of his daily backup cycle, translating into two physical terabytes on his Data Domain box saved by the archiving process. Archiving in this way also saves about 36 physical tapes per year, he said.

While the infrastructure has yielded savings, Mirzai said he'd like to deploy a simpler architecture, possibly from a single vendor, in the future. "There are some vendors that have package deals that do all those steps for you," he said. "Piecing it together requires a good understanding of what exactly you want to do, policies and licensing."

Data deletion still an important issue; mileage may vary

While archiving can improve backup efficiency, data growth in the archiving tier must also be addressed. A "keep everything" policy in the archive can wind up pushing the data growth problem from backup to another part of the environment.

More on data deletion and data reduction
Data destruction options for your backup data

Free and easy ways to speed up your backups

Ten ways storage and backup administrators can save time and money
Curtis Rawlings, assistant chief information officer for DeKalb County, Ga., also archives data using Symantec Corp.'s Enterprise Vault software with EMC's Centera for hardware. The Enterprise Vault repository currently holds about 6 TB of data that is no longer part of the operational backup cycle, and helped save space on primary storage as well. "Backup was killing me under the old system," Rawlings said. Exchange backups often ran into each other. "I couldn't get it all done within a 24-hour period. Archiving solves that problem."

However, Rawlings said data deletion from the archive is becoming an important issue in his county as well. "It's a cultural thing to try and get people to drop old data from the archive," he said. "But sometimes you have to let people feel the pain somewhat -- you have to come back and ask for more money for storage and that gets their attention. You can control the storage costs by deleting data." Also, Enterprise Vault version 8 supports block-level data deduplication of archiving data, something Rawlings said he's considering when he upgrades the Enterprise Vault software next year.

But remember, archiving isn't suitable for every environment. At least one user contacted by SearchDataBackup, Shayne Williams, systems administrator for City of Bellevue, Wash., said he went the archiving route first, but swapped out a multi-tiered environment for NetApp FAS3000 arrays with built-in data deduplication.

The city had been using EMC's DiskXtender and a tier-2 disk array for archiving files in its 1,500-user, 30 TB environment, but found managing multiple systems to be a hassle that outweighed the data reduction benefits. "Deduplication has the biggest ROI," Williams said, estimating that NetApp's built-in deduplication will compress data between 40% and 60% on the city's new FAS3000 arrays. Because the city bought two of the devices to replicate between data centers for disaster recovery, the data deduplication is also cutting down on the bandwidth required for sending data over the wire.

"With deduplication, we get the savings and no longer need a separate product with SAN integration and additional licensing," Williams said. Since the city pays third-party subcontractors to manage its environment, the operational expenditure of managing multiple systems was an important consideration.

However, Enterprise Strategy Group (ESG) analyst Lauren Whitehouse cautioned that this approach isn't ideal for the primary reason people use archiving -- regulatory compliance. "Archives tend to understand the history of copies and offer some historical point of reference, so you can go back in time and search on them when required," she said.

Dig Deeper on Data reduction and deduplication