gosphotodesign - Fotolia
Although there are undeniable similarities between data backups and archiving, the two strategies exist for completely different purposes. Data backups are designed to act as recoverable, point-in-time copies of data. Archives, on the other hand, exist as a way of providing long-term storage for aging and seldom-accessed data. Although backups and archives have long been considered to be separate processes and have been managed independently of one another, many organizations are now converging them.
On the surface, a converged backup and archive strategy makes a lot of sense; after all, they both exist as data copies. However, there are significant differences between backups and archives besides the corresponding retention policies. True backup and archive convergence requires an understanding of these differences and how to accommodate them.
Issues with backup to archive data migration
It would stand to reason that a converged backup and archive strategy could work simply by migrating data from the backups to the archives based on the data's age. Although some of the available products do work in this way, there are problems associated with this approach.
It isn't always desirable to archive backup contents. An organization might instead want to archive production data.
Most organizations do not have just one type of data, and not all data types are suitable for archiving. Consider all of the types of data that an organization might back up: structured database data, application data and unstructured data. In addition, an organization might back up entire virtual machines (VMs), virtualization host servers and possibly some physical servers.
Assume, for example, that an organization's primary data protection strategy involves backing up its VMs. Although business needs vary, most organizations probably would not want an automated process to move entire VMs from their backups to their archives, especially if they are still in use. But it might be acceptable to move some of the data that exists inside of a virtual machine to the archives.
Even if data is suitable for archiving, not every data type should be archived in the same way. This is especially true for structured data. It may not make sense to try to dynamically migrate aging data from an application database to an archive database. Data reduction technologies such as deduplication can help to minimize the data's storage footprint.
What is the best storage medium for data archives?
The storage medium best suited for data archiving has been hotly debated because each type has its advantages and disadvantages. None of the possible media types are ideally suited to every conceivable form of archiving. There are some situations in which it may make sense to archive to disk and other situations in which tape-based archiving may be a better option. In other cases, cloud-based archiving might be the best route. An organization must choose its archive media type based on the types of archiving that it performs and on the organization's own business requirements.
Spinning media, for example, is a good choice for organizations that need to keep archive storage online and readily accessible. Conversely, tape-based storage typically remains offline, but provides nearly limitless capacity, and at a much lower cost per gigabyte of data than disk. When properly stored, tape is likely to have a greater longevity than disk. It is relatively common for tapes to be certified to have a 50-year shelf life, with some tapes being rated for even longer.
In some cases, it is the organization's applications that determine the type of archive storage that must be used. Exchange Server, for example, supports native message archiving through the use of archive mailboxes residing within an archive database. This archive database can only function if the database is mounted, which means that tape-based archiving is not an option.
Some applications are designed for application-level archiving. For example, Microsoft Exchange stores messaging data in a mailbox database. In addition, Exchange Server natively supports the creation of archive databases, which can reside on commodity storage. Policies within Exchange Server can be used to automatically migrate mail from a mailbox database to an archive database based on message age. Since Exchange Server natively supports message archiving, it would probably make more sense to use those built-in archiving capabilities than to try to force message archiving at the backup application level.
Do I really need a converged backup and archive strategy?
An organization that wants to converge its backups and archives must consider several questions:
What type of data should be archived as a part of the convergence? An organization might decide to archive aging file data, but avoid archiving structured data. When making such a determination, it is important to consider whether using convergence for the backup and archive strategy provides the best method of achieving the desired results or if another method -- such as using an application's native archival capabilities -- might work better.
What is the source of the archived data? In other words, should the data be moved directly from primary storage (live production data) to the archive or should it be moved from secondary storage (the backup) to the archive?
Typically, aging production data will have already been backed up, so archiving that data may mean migrating it from the backup to the archives and then removing it from primary storage. Such a technique would ensure that the data resides within archive storage only, not within primary or backup storage.
If archives are structured to store the only remaining copy of aging data, then it becomes extremely important to implement a mechanism for protecting the archive contents. Otherwise, an archive-level failure could result in permanent data loss because the backup no longer contains a copy of the data. One way to prevent archive-level data loss is to use replication to create a redundant copy of the archive storage.
Will the archives require any additional capabilities beyond the ability to store data? It is tempting to think of the archives as nothing more than a long-term storage repository for data that is no longer used in production. If left unchecked, however, such an archive could become a data black hole. At the very least, an organization will need indexing capabilities that can locate data within the archives if the data is ever needed.
Other helpful capabilities include e-discovery, legal hold and a data lifecycle management feature that can purge data from the archives once it has outlived its required retention period. Purging expired capacity not only frees up capacity within the archives, it also prevents outdated data from being used against the organization in the event that data is ever subpoenaed.
So is it better to use a converged backup and archive strategy or to continue to manage the processes like in the past? Although the actual cost will vary widely from one organization to the next, a converged approach is likely to be less expensive in terms of both capital and operational expenses. If nothing else, it is almost always more cost-effective and simpler to manage a single system than to manage multiple, disparate systems. Furthermore, a converged system may use less hardware than a traditional architecture might.
Data backups and archiving have long been used for protection and retention. Although these processes have historically been kept separate from one another, the convergence of backups and archives has the potential to reduce costs, complexity and administrative overhead. It is important to keep in mind, however, that it may not be practical to archive every type of data, so organizations must plan their archival strategy accordingly.
Backup software products including more archive features
Apply backup and archive to broad data protection strategy
Archives and backups can leverage the same tech