Data backups and data archives are frequently confused with each other. Many people don't know the difference between the two enterprise data storage technologies, and often assume that if necessary, they can use their data backup storage as data archive storage, which often is not a good idea. W. Curtis Preston, executive editor at TechTarget and independent data backup and recovery expert, outlines the differences between data backup vs. data archiving products, and the dos and don'ts associated with each technology in this Q&A.
Listen to this podcast, and learn archiving best practices, how to choose a data archiving tool, and how to put together the best data archiving strategy for your business. His answers are also available as an MP3 below.
Table of contents:
>> What's the main difference between data backups and data archives?
>> Why shouldn't you use data backups as data archives?
>> What do archiving products do that backup software doesn't?
>> Is it better to use an archiving tool from your backup software provider or a third party provider??
Data backups are for disaster recovery (DR) and data archives are for discovery. The purpose of both of them is really what makes them different. A data backup is for recovery or restoring lost or corrupted files. So if you accidentally deleted a file or a bunch of files, or you had a double disk failure in a RAID 5 array, and you need to restore things to the way they previously looked, that's what backups are for. Or say something bad happened to your files yesterday, you might want to restore them to two days ago or three days ago. Or if you want to get a version of a file from a few weeks or months ago, you can use your backup application.
What you don't want to try to do with a backup application is to use it for e-discovery; that's what data archives are for. For example, if you're sued or investigated by someone, you most likely will not be asked to restore things to the way they looked yesterday, but more likely you will be asked for all of the emails with specific keywords in them between one person and another, your company to another company, or files that are in a particular directory. Data archives will show a history of files, where they existed, when they existed and who changed them. Backup systems are not good with performing any of these tasks. You can get those items from a backup app, but it's much more difficult to do so.
Basically the purpose of a data archive is not going to be met by a backup app. If someone asks you for specific emails, you're not going to be able to go to your backup system and ask that question. For example, let's say you have a full backup of Exchange every week for the last seven years. Then someone comes to you and says, "I want all of these emails with this word in them." What you're going to need if you want to extract this information with a backup application is restore the entire Exchange server and then extract out of that Exchange server the files that you need from seven years ago. Then you're going to need to restore Exchange again to seven years ago minus a week, and do that all over again and over again, and in this case, roughly 150 times. Then you're going to have to extract from it what you need. So doing satisfying archive requests with data backup and recovery software is something you'll only do once. You'll try it and then say to yourself, "We should have used archive software to satisfy this requirement."
They do two main things differently: how they store the data and what data they store. Data archiving for e-discovery watches a system and archives data as it comes in. For example, in emails, whenever an email is sent or received, it is immediately sent to the archive system. It's nothing like a data backup that runs as a batch process every night. With archive systems, data is archived in real time as they're being created or received. Another difference between the two is the level of detail that archives store. Data archives store all of the metadata in a file. For example, in emails, a data archiving product stores the subject line, the sender and the receiver, and perhaps even looks into the body and attachment for key words. Archives store all of this information in a database as well as store the document and email attachment in a similar way that a data backup system would. But essentially, archives take data and put it into a database that can be searched. Whereas in a backup system, data is just stored on a tape with no search capabilities.
There are some good archiving tools out there and there are some not so good tools out there. I would first go to my backup vendor and ask, "Why should I use your archiving software versus any other third-party utility?" Once you know their offer, then you can go out and do a full comparison of other products.
However, one company is blurring the distinct line between data backup and data archiving software. With CommVault Inc.'s backup software, when you do a backup, it has the capability to access the metadata in a file and store it alongside their backups. But keep in mind that although it does provide some e-discovery capabilities, I still wouldn't consider it a data archiving product.