Backups are primarily used for operational recoveries, to quickly recover an overwritten file or corrupted database.
The focus is on speed, both to back up and recover, and on data integrity. Archives, on the other hand, typically store a version of a file that's no longer changing, or shouldn't be changing.
Speed is less important in archives; even if the event is a legal action, you typically only have a few days to respond. Searchabilty is more critical in archives. In addition, importance is placed on the ability to scale data integrity and data retention over a long period of time, possibly decades. An archive is no longer limited to traditional files and images; most database applications have specific archive capabilities to allow the primary database to stay lean and fast while the archive is retained for research and compliance.
Email archiving applications are often the catalyst for establishing a separate archive process. It's important to realize that you are legally responsible to do more than just capture email.
When considering combining archive and backup onto a single platform, the decision will depend on the specific platform, what the organization's retention requirements are, and the expected goals of the backup and archive process.
Can tapes be used for archives?
While the vast majority of organizations consider tape for their long-term archives, and companies like Index Engines Inc. provide the ability to more effectively search for data on tape, there's a risk in counting on tape for the storage of archive data.
Just as disk has become a popular addition to the backup process because of the concerns about recovery from tape, data that's archived to tape should be considered just as vulnerable. It's difficult to develop an ongoing process to verify the integrity of tape, leading to greater concerns the longer the media sits on a shelf. There's also a simple technology issue. Even if your retention requirements are only seven years, think back seven years ago -- LTO-1 or LTO-2 was becoming the standard, DLT in the Super DLT form was still considered competition. What's the likelihood that the LTO-1 tape that's been sitting on the shelf can be read and successfully restored from in the new LTO-4 drives in your data center? Anecdotal stories of sub-50% success rates aren't uncommon.
Even if the hardware works, how are you going to find a piece of data that's seven years old from hundreds -- and possibly thousands -- of tapes? Most backup applications don't keep their metadata (the data about the data being backed up) very long. In fact, the average length of time is approximately 90 days to 120 days. After that, it's up to your records-retention skills or the person who had your job before you, or even the person before them. Recovery of data this old is most likely going to require guesswork, plenty of manual scans and lots of time.
Should you use disk for archives?
The thought of keeping all archives on disk may seem impossible and costly, but companies like EMC Corp., Hewlett-Packard Co. and Permabit Technology Corp. are delivering technology today to make a disk archive that can last for 25, 50 or 100 years a reality. But, the disk drive you start with today won't be the same disk drive that you use 100 years from now (if we still even use disk drives 100 years from now).
While disk also makes combining the process of backup and recovery into a single platform more realistic than tape, a best practice is to have a specific system for archives. Archives have different retention requirements, different recovery needs and different searchability requirements than backups.
Most disk-based archive systems present themselves as a network mount point, which makes access over time realistic. Unlike a seven-year-old tape drive, you access a CIFS or NFS mount in an almost identical fashion from seven years ago as you do today.