Data archiving is the process of moving data that is no longer actively used to a separate storage device for long-term retention. Archive data consists of older data that is still important to the organization and may be needed for future reference, as well as data that must be retained for regulatory compliance. Data archives are indexed and have search capabilities so files and parts of files can be easily located and retrieved.
Data archives are often confused with data backups, which are copies of data. Data backups are used as a data recovery mechanism that can be used to restore data in the event it is corrupted or destroyed. In contrast, data archives protect older information that is not needed for everyday operations but may have to be accessed occasionally. The data archives serve as a way of reducing primary storage consumption and related costs, rather than acting as a data recovery mechanism. Some data archives treat archive data as read-only to protect it from modification, while other data archiving products treat data as read / write. Data archiving is most suitable for data that must be retained due to operational or regulatory requirements, such as document files, email messages and possibly old database records.
Data archiving benefits
The greatest benefit of archiving data is that it reduces the cost of primary storage. Primary storage is typically expensive because a storage array must produce a sufficient level of IOPS to meet operational requirements for user read / write activity. In contrast, archive storage costs less because it is typically based on a low-performance, high-capacity storage medium.
Archive storage also reduces the volume of data that must be backed up. Removing infrequently accessed data from the backup data set improves backup and restore performance, and lowers secondary storage costs.
Online vs. offline data storage
Data archives take a number of different forms. Some systems make use of online data storage, which places archive data onto disk systems where it is readily accessible. Archives are frequently file-based, but object storage is growing in popularity.
Other archival systems use offline data storage in which archive data is written to tape or other removable media using data archiving software rather than being kept online. Because tape can be removed, tape-based archives consume far less power than disk systems. This translates to lower archive storage costs.
Cloud storage is another possible archive target. Amazon Glacier, for example, is designed for data archiving. Cloud storage is inexpensive but requires an ongoing investment. In addition, costs can grow over time as more data is added to the cloud archive.
Data archiving and data lifecycle management
The archival process is almost always automated using archiving software. The capabilities of such software vary from one vendor to the next, but generally speaking the software will automatically move aging data to the archives according to a data archival policy set by the storage administrator. This policy may also include specific retention requirements for each type of data. Some archiving software will automatically purge data from the archives once it has exceeded the lifespan mandated by the organization's data retention policy. Many backup software platforms are adding archiving functionality to their products. Depending on your needs, this can be a cost-effective and efficient way to archive data. However, these products may not include all of the functionality found in a dedicated archive software product.