Olivier Le Moal - Fotolia
It's important to archive various types of data in order to meet legal, regulatory and business requirements. Yet, over the course of years and even decades, data piles up.
This is especially true now that big data is a critical resource for many enterprises. Because stockpiling vast quantities of data is both time-consuming and expensive, it's becoming increasingly necessary to create a data archive strategy to identify and prioritize data while deleting files that are of little or no permanent relevance.
Unfortunately, it remains common for many enterprises to simply save everything forever -- in many cases, long past the data's useful life, observed Robert Cruz, senior director of information governance at Smarsh Inc., a cloud-based archiving and compliance technology provider. "This is often due to concern from legal teams that data could ultimately have relevance in litigation," he said.
Data that's no longer used on a regular basis, yet remains essential for business, legal or compliance reasons, should be archived on lower-cost, higher-capacity storage systems, such as hard drives or tape drives. "If the data is no longer useful or required by the organization, it should be deleted," said Cindy LaChapelle, principal consultant at Information Services Group, a technology research and advisory firm. She suggested that data deletion and retention practices should be a key part of every organization's data archive strategy and lifecycle program, since leaving archival data on high-performance storage platforms leads to needless cost and saps staff productivity.
What stays and what goes
The types of data that require long-term retention depend primarily on the enterprise and its regulatory and legal requirements. Some data in the medical sector, for example, may require retention for 30-plus years.
"Financial documents, technical specifications and documents in industries like aerospace are other examples where legal and compliance requirements drive long-term retentions," LaChapelle said. "The amount of data that needs to be retained 'forever' is minimal to none." On the other hand, the stockpiles of data being kept forever by overly cautious organizations is likely huge, she observed.
There are many types of data, such as duplicate files, that can be deleted immediately. Likewise, automated processes that spew large amounts of redundant data should incorporate a mechanism that removes data copies once the process is completed, LaChapelle suggested. Additionally, personal data that's governed by privacy laws or regulatory requirements, such as GDPR, shouldn't be archived unless there are also policies in place to ensure that deletion from the archive occurs in compliance with the regulatory restrictions.
Developing a retention and data archive strategy
All enterprises need a strong data retention strategy that addresses the entire data lifecycle. The data strategy provides a solid basis for all activities. "Without this foundation, archiving data will be haphazard, confusing and expensive," said Kim Kaluba, senior product marketing manager in data management for analytics software provider SAS.
According to Kaluba, a data strategy consists of five components: identify, store, provision, integrate and govern. Each component plays an important role in data archiving.
"The identify stage determines the data and business processes that should be archived and how long the information needs to be kept," she said. The store, provision and integrate processes detail how and where archived data is housed. "These components detect the accessibility and service levels for data to ensure that when the data is needed, it can be accessed in the defined timeline," she said.
The final component -- govern -- outlines who has access to the data and business process, who owns the archiving process and what security mechanisms are in place to ensure the protection of any sensitive data residing in the archived environments.
Creating a data archive strategy is essential for meeting compliance, tax and various business requirements. In healthcare, for example, various state regulators require the retention of records for periods ranging up to 19 years.
"This is likely not data anyone wants sitting in a transaction system," said Michael Cantor, CIO of Park Place Technologies, a data center support services provider. Corporate tax records, meanwhile, are typically kept for 10 years to provide IRS audit protection. Finally, there are often numerous long-running contractual commitments, such as leases, "where data related to the contract would be desirable to keep," Cantor said.
It's important to note that archives and backups are not the same thing, although often these terms are used interchangeably. Backup and archive are complementary technologies, LaChapelle said.
"Archiving moves data into a separate environment and then indexes it and makes it searchable as well as readily retrievable," she said. "Backup is designed to provide a layer of data protection in case of corruption or deletion."