A data retention policy, or records retention policy, is an organization's established protocol for retaining information for operational or regulatory compliance needs.
When writing a data retention policy, you need to determine how to:
- Organize information so it can be searched and accessed at a later date
- Dispose of information that is no longer needed
Some organizations find it helpful to use a data retention policy template that provides a framework to follow when crafting the policy.
A comprehensive data retention policy outlines the business reasons for retaining specific data as well as what to do with it when targeted for disposal.
A data retention policy is part of an organization's overall data management. A policy is important because data can pile up dramatically, so it's crucial to define how long an organization needs to hold on to specific data. An organization should only retain data for as long as it's needed, whether that's six months or six years. Retaining data longer than necessary takes up unnecessary storage space and costs more than needed.
The operational reason for implementing a data retention policy involves proper data backup. An organization's backup data helps it recover in the event of data loss. A policy is important to make sure the organization has the right data and the right amount of data backed up. Too little data backed up means the recovery will not be as comprehensive as needed, while too much causes confusion.
A data retention policy should treat archived data differently from backup data. Archived data is no longer actively used by the organization, but still needed for long-term retention. An organization may need data shifted to archives for future reference or for compliance. Archives are stored on cheaper storage media, so they reduce costs and the volume of primary data storage. A user should be able to search archives easily.
For proper creation and implementation of a data retention policy, especially regarding compliance, the IT team should work with the legal team. The legal team will have a better idea of how long data must be retained by law while IT is responsible for the actual implementation of the policy.
It's important to be careful with the data retention policy. Just because a file was created decades ago doesn't mean it should be automatically deleted after a certain time. That old file could be an important contract that the organization needs to retain or contain other valuable information.
A storage system can retain or remove data based on rules set up by IT. The use of metadata is one way to figure out when a data object will be scheduled for deletion or designated to a given storage location. Automated software moves old data to archives, which is especially helpful for organizations with large data volumes. Some software can automatically delete data based on age, outlined in a retention schedule. But administrators need to be certain that deleted data serves no further purpose.
Length of time in a data retention policy ranges from minutes to years. As a result, it's important to use a policy engine that involves many different fields, such as user, department, folder and file type.
A data retention policy should include email messages. Emails pile up quickly, and some take up a lot of space, so it's important for an organization to set a reasonable timetable for retention. As with the data retention policy as a whole, the IT team should work with legal on email retention schedule details.
Regarding targets, object storage is a popular choice in a data retention policy, as it provides solid data protection at a moderate cost.
Public cloud storage is another common location for data that requires long-term retention. It is typically cheaper than on-premises storage, especially in infrequent access tiers. Cloud service providers offer off-site data protection, which is important in the event of a disruption to the organization's main data center. Speed of restore depends on the tier and size of the data set.
In addition, tape continues to play a key role in long-term data retention. Infrequently accessed historical data finds a good home on tape, where it takes longer to restore than other formats. Storing data on tape for years is typically cheaper than storing it in the cloud and uses less energy than disk storage. Like the public cloud, tape provides off-site storage. Tape has advanced over the years. The latest version of LTO, LTO-8, which launched in late 2017, contains up to 30 TB of compressed storage capacity, a sustained data transfer rate of 750 megabytes per second for compressed data, the write-once read-many capability, the Linear Tape File System for partitioning, and encryption.
A data retention policy must consider the value of data over time and the data retention laws an organization may be subject to. In 2006, the U.S. Supreme Court recognized that it is not financially possible to retain all information indefinitely. However, organizations must demonstrate that they only delete data that is not subject to specific regulatory requirements and use a repeatable and predictable process to do so. This means various types of information are held for different lengths of time. For example, a hospital's retention period for employee email would be different than that of its patient records.
Carol Stainbrook discusses the importance of creating a data retention policy that's well-defined for your organization.
While it is common for an organization to establish its own data retention requirements, certain data retention laws must be adhered to. This is especially true for organizations operating within regulated industries. For example, publically traded companies within the U.S. must establish a Sarbanes-Oxley Act (SOX) data retention policy. Similarly, healthcare organizations are subject to Health Insurance Portability and Accountability Act (HIPAA) data retention requirements and organizations that accept credit cards must adhere to a Payment Card Industry Data Security Standard (PCI DSS) data retention and disposal policy.
Simply retaining data is not enough. Federal laws commonly require organizations in regulated industries to create a documented data retention policy.
An organization must also take into account the General Data Protection Regulation (GDPR), which went into effect in May 2018 and updated data privacy laws across the European Union. Mandates apply to personal data produced by EU citizens, whether or not the company collecting the data is in the EU, as well as any people and organizations whose data is stored within the European Union. It's critical to have a data retention policy that explains which data is being held, why and where it's being held, and for how long, as it relates to GDPR directives. Especially with a sweeping compliance regulation such as GDPR, it's important to only keep the personal data that's needed.
Common data retention policy issues
Data continues to increase dramatically, not only in primary storage but in backup data and archives as well. Backup takes a particularly burdensome toll when the same data gets backed up. A data retention policy is one way to reduce volume and eventually automate the process of retaining data sets.
However, creating a data retention policy is complex. Setting a data retention schedule is not cut and dried. Certain data sets require retention of different lengths of time for legal and operational reasons. An organization needs to be careful, especially if it is instituting an automated form of data retention.
Storage can be a burden as well. That's why a good data retention policy is clear about the type of storage where retained data will go to optimize budget and space.
Proper data disposal
When a protected record's age exceeds that of the applicable data retention policy, the record needs to be disposed of properly. Organizations are not required by law to dispose of old data, but it is often in their best interest to do so.
Many organizations use an automated system, typically a dedicated archive software product, to securely delete data that no longer falls within the required data retention period. Automation ensures data will be disposed of in the proper time frame without manual intervention. Some organizations may use their backup software's archiving functionality to automate data disposal.