Sergey Nivens - Fotolia

Evaluate Weigh the pros and cons of technologies, products and projects you are considering.

Can data backup and archiving be converged?

George Crump discusses data backup and archiving, and the value of merging these tasks as well as the challenges associated with doing so.

For years, IT best practices have proclaimed that data backup and archiving are two distinctly separate functions that should never be intertwined.

However, much of this conventional wisdom is based on an outdated view of the capabilities of the backup architecture.

Now with more scalable software indexes, plus cost-effective disk backup appliances that can handle mixed workloads and open-format tape, is it time to rethink conventional wisdom and finally merge these two processes?

The value of converging data backup and archiving

Before determining if combining backup and archiving is even possible, the value of doing this should be considered. What is to be gained from converging data backup and archiving?

The first big gain is that the data only has to be moved once. Without convergence, data is backed up repeatedly, and when it is determined suitable for archive it is copied once again.

The second big gain would be the elimination of multiple storage silos. A single silo that can store both backup and archive would significantly reduce costs.

Third, proper data management best practice is that the archive stores the only copy of archived data. This means it is removed from production storage and backup storage. Because of this the archive needs to be protected, either by replication or by copying the data to a second archive storage area. This is a problem, especially for disk archives. Data can be replicated to a second disk archive (expensive) or "backed up" to another disk archive (complex and error-prone).

The net result of convergence? A significant reduction in capital and operational costs as well as simplification of IT processes.

Can backup and archiving be converged?

The first challenge to overcome when merging data backup and archiving is to make sure that the software can perform the task.

Backup is the key first step in this process, as data has to be securely stored on a secondary storage device. This means if backup and archiving are merged, it has to happen within the backup software.

In the past, this was impossible because backup applications had relatively simple databases that could not scale to meet the demands of tracking millions of files. Even if backup software had these capabilities, the typical hardware used for a backup server would not have been able to provide the horsepower that such an application would need.

Times have changed. Many modern backup applications have very scalable databases that can track trillions of objects. From a server standpoint, even modest server hardware today is suitable for hosting the application and its database.

The second challenge is making sure that the backup hardware (disk and tape) can scale to meet demands and be designed to work together to keep storage costs under control. The disk-based backup system makes an ideal landing area for the initial and working sets of backups. Also, thanks to scale-out technology, they can continue to store that data for the foreseeable future.

Many disk-based backup systems can now support the workload difference between backup and archiving. A key point to examine is the scalability of the system's deduplication engine. In a combined environment, there would be millions (if not billions) more files to track, so the deduplication engine could be overburdened. Planners need to look for deduplication engines that can segment out data so they don't have to manage all of it, or engines that can truly scale to meet the demands of managing trillions of files.

The third challenge is readability of the backup and the archive. Most backup and archiving applications write data in a proprietary format. However, this use of proprietary formats is changing. An increasing number of virtualization backup vendors write their backups in a native format that is directly readable by the hypervisor. At the same time, more archive vendors are writing their data to tape in a standard format called LTFS (Linear Tape File System). Many tape vendors support LTFS now or have expressed plans to support it.

What's missing?

The key missing ingredient is for the backup software to act more like archive software. This means having a more file-level understanding of data so that retention policies can be set based on file type, location and age. There are some backup applications that provide this capability already such as IBM's TSM and HP's Data Protector. Another key element would be for these applications to write data in native formats that can be read by an operating system.

How to get there

If a data center wants to merge data backup and archiving, there are two viable options to achieve that goal. First, the IT planner could select backup software that provides a scalable database and supports both disk and tape. They may also want to look for software that can track data at a file level.

An alternative may be to look at an archive product that has merged disk and tape. These products present a single NFS/CIFS mount point to which data can be sent from the backup application or a series of scripts. The product would then manage the movement of data from disk to tape, as well as the retention of that data.


The convergence of backup and archiving can significantly reduce the management costs as well as the hardware costs associated with those processes. In fact, there are backup software products that can provide the data management and the integration of a variety of backup and archive hardware technologies like indexing and context-level search.

Also, archive products like those from Crossroads Systems and Quantum have evolved to be "just another mount point." Hidden behind this mount point can be inexpensive disk, object storage, or even tape. Data could be dumped into these repositories with simple copy commands or using application utilities, allowing this repository to serve as both a backup and an archive. Snapshots can be taken of the archive to deliver a point-in-time, rollback capability.

About the author:
George Crump is president of Storage Switzerland, an IT analyst firm focused on storage and virtualization.

Next Steps

Building an online data backup and archiving service

Keep backup, archiving separate to save money

What you should know about tape backup and archiving

Dig Deeper on Backup and recovery software

Join the conversation

1 comment

Send me notifications when other members comment.

Please create a username to comment.

This makes a lot of sense and seems to be where the industry is going. Instead of having discrete, manually created "states" for data, have software decide where data should go based on how often it's accessed. That way, the data that's needed more often stays on faster hardware, while the less frequently accessed data goes into cold storage, like Facebook is using.