Data backup and archiving continue their convergence path

Data backup and archiving have long been separate processes. But backup software and hardware advancements are enabling convergence between the two disciplines.

From a purist's point of view, data backup and archiving should be two separate processes. There are good reasons for this, given that the purpose of each process is so different, as are the demands each technology places on the storage system. But most data centers, especially those in small to medium-sized organizations, don't have formal archiving strategies, and given the capabilities of modern backup software and hardware, they may not need one. But is it truly possible for backup to replace or obviate the need for an archive?

The primary responsibility of backup is to restore the latest copy of data back to production storage as quickly as possible. Depending on the recovery point and recovery time objectives, data centers will go to great lengths to ensure data copies are as current as possible. The backup process may also be used to restore a prior version of the data set that is not necessarily the latest copy. This may be necessary in a situation when several protection copies were made prior to finding corruption.

The primary responsibility of an archive is to store data for a long period of time. The archive needs to be searchable, cost effective and ensure data integrity. It typically does not need to offer fast recovery times. Most of the data in the archive will likely never be accessed again. If data is accessed, it would be for a specific event like a legal discovery or analytics.

In the past, data backup and archiving essentially complemented each other. Archiving moved older data off primary storage to a tape library, so that it did not need to be backed up to expensive disk backup storage devices. This also kept the size of backup indexes down, which reduced the risk of corruption.

Data backup improvements push the archive envelope

The focus of backup software has always been to provide the ability to capture more data more frequently and to rapidly move that data to the backup device. However, these advances have also helped the process become more suitable for long-term retention. To keep up with increased ingestion rates, backup software vendors have created better backup software databases. A side benefit to this effort is that these databases are much more scalable and can store data for a longer period of time without risk of corruption.

Features such as source-side deduplication and Changed Block Tracking reduce the amount of data that needs to be transferred, because only the segments of data that have changed since the last protection event are copied. As a result, one of the main justifications for implementing an archive -- reducing backup data -- is not as significant as it once was.

The decision to merge data backup and archiving will depend largely on the size of the archive.

Disk backup targets have also improved. These devices can now store dozens of petabytes of information, use deduplication and compression to achieve maximum data efficiency and rival tape from a cost perspective. They also have the ability to verify data integrity almost in real-time.

These improvements have led to tape becoming less popular for archive storage. While most archives still don't need the rapid recovery of a backup process, some organizations do want faster archive recoveries than what tape can provide.

Finally, many modern backup applications, thanks again to advanced databases, provide search capabilities that rival dedicated archiving software. Some backup products even provide deep, context-level search capabilities. Other backup applications include primary storage snapshots within their search parameters, presenting a unified view of protected data.

Should you replace archive with backup?

The decision to merge data backup and archiving will depend largely on the size of the archive. How much data needs to be retained and for how long? If it is a multi-petabyte scale requirement with long-term retention needs, then standalone archive software will provide better scale and the cost differential will be substantial. In addition, tape's cost advantage is significant in these environments. For petabyte-sized data centers and smaller, the capabilities of modern backup software and hardware technology are more than enough to allow backup to replace archiving.

Next Steps

Why backup and archive should remain separate, but related

The ongoing convergence of backup and archiving processes

How one platform allows both backup and archive to occur

implement direct backup technology for improved data protection

Dig Deeper on Disk-based backup