Published: 08 Feb 2004
The traditional, tape-based backup system is becoming rarer every day, and its old friend--regularly recurring full backups--just might be next. You can blame it on cheap ATA disks. At first, people started using disk to solve a few problems with their backups, and ended up completely changing how they're protecting their data.
Today, the most common backup design is a tape-based system that's been enhanced with disk. But those who are willing to rethink things from scratch are examining other ways to improve their backup environment such as replication, object-based storage, real-time protection of data, protection of data in its native format and using systems that perform incremental backups forever.
Cutting the tie to tape
Many of the challenges with most backups stem from reliance on tape. Granted, tape drives are faster and more reliable than ever before, but tape is still a sequential-access medium that offers access times in seconds, instead of the nanosecond access times disk delivers. Tape is also an open system easily infiltrated by contaminants, unlike disk drives that are sealed at the factory. A tape drive can reliably write millions of bytes per second at a relatively low cost. However, due to its sequential operation, tape will always be slower to access and less reliable than random-access disk.
One of the greatest advantages of tape over disk is the ease with which tapes can be sent off site, which is what most people do. However, this advantage is eroding. When using tape, people either send their originals off site, or they make copies of the originals and send them off site. There's nothing wrong with doing this. In fact, if you're not doing it, you should be! This backup process has been going on for a long time, and certain things must take place for it to happen.
For example, let's assume that the process of creating and identifying the tapes to go off site has been completely automated. The process of moving a tape from a library into a container to the off-site facility and back again is very labor-intensive. While this work is routine for the most part, someone's got to do it and that labor is expensive. I've seen companies where a dozen people's sole responsibility was to manage such a backup process. It's also important to mention that in each step of this process, there is a chance for human error.
It's a fact that in many environments, backup is neither automated nor effortless. Many people spend many hours a day ensuring that their backups are complete. This effort is required for many reasons. The first is that the process of performing nightly incremental and occasional full backups requires a lot of processing power, network bandwidth and is usually directed at a target that's not perfectly reliable (tape). Every part of the process is capable of screwing up the backup.
If everything works and all the backups are completed, they should be copied, instead of just sending the originals off site like many companies do. But most environments have spent so much time and effort making sure that the backups are completed, there's little if any energy, time or capacity left in their system to make copies. Yes, many backup software products now allow you to create both the original and the copy simultaneously. But according to an informal survey of my clients, few companies are taking advantage of this important functionality. Therefore, most companies are sending their originals off site.
This means that they must wait for a tape every time they do a restore. While this is acceptable for low-priority systems, it's completely unacceptable for a high-priority critical application. But this is the status quo at many companies. That is, of course, until the first time they try to do a major restore and it goes horribly wrong. This happened at a company I was talking to last month. What should have taken hours took days, and the CIO is now looking for a new job.
Even when restores are successful, it's always a bad day when you have to restore something large. Unless you've adopted some of the technologies discussed later in this article, it often means hours of downtime, and it's rare that the system is restored up to the point of failure. There's almost always a gap of time that isn't restored. That gap translates into lost work for everyone using the system and is a huge loss of money in a large company.
And consider how backups affect applications. Many companies have grown used to slow access or no access to applications during backup and recovery time. While this may have been fine in days past, with today's 24x7 global operations this is no longer acceptable. Your work force expects uninterrupted access to all applications at all times.
|Saving duplicate data once|
Backups are performed inefficiently
Many companies do a regular full backup every week or month, which further aggravates the problems I've mentioned. This means more tapes are created (and hopefully copied), more data is sent across the network and more CPU resources are consumed on the backup client. Why, for example, make another copy of files that haven't changed, and are already securely ensconced on tape? The answer is because restores would take longer than they already do. Not performing full backups means that in order to restore a large file system, you need a full backup tape from months or years ago, followed by every incremental tape created since then that contains the last version of any file that was in the file system before it was damaged. That could add up to hundreds of tapes. So, full backups are performed every once in a while to overcome this problem. However, not every tape product does it this way.
A similar question is: Why make additional backup copies of files that have already been backed up from other systems? For example, how many times do you need to back up wordpad.exe or /bin/ksh? The answer is, "only once," but no one does it that way. The reason is that tape-based backup software packages don't perform what's called single instance store backups--storing only one copy of each unique file. If they did, can you imagine how many tapes you would need to restore a large file system? Because most backup systems have been tape-based for years, until recently no vendors have even thought of adding such features to their software, but that's beginning to change.
Now ask yourself another question: Why is the backup copy of the data in a different format than the original? The answer is simple. The commands cp filename.txt /dev/rmt/0cbn and copy myfile.txt tapetape.0 don't work. You can't copy files to a tape drive. Of course, in the Unix world, you could use something like this:
find . –print| while read i do dd if=$i of=/dev/rmt/0cbn bs=32k
That simple string writes every file on disk to tape in its native format, similar to the way ANSI tapes for the mainframe were made. But this is an inefficient use of tape. Every file would write a file mark on the tape, which would translate into extremely slow throughput. So, some bright Unix developer came up with the idea of putting several files into a backup file that could be written to tape and cpio, dump and tar were born. Now it's 30 years later, and we're stuck with this handed-down way of doing things.
While writing regular files into a backup file makes perfect sense for tape, it doesn't need to be done when writing to disk. In fact, it adds unnecessary overhead. Not only does it slow down the creation of the backup copy, it slows down the restore of individual files from that backup. If the files were left in their native format, these problems would be eliminated. You could restore as many files at a time as is needed; the only limitation is the disk throughput. In fact, if the files were left in their native format, you could even use the backup copy as the production copy in a crisis. This is exactly what happens with replication products.
So, what's changed?
In case you've missed the recent news flashes, ATA disk arrays have changed the face of storage forever. Terabytes of disk-based storage cost fewer than three cents per megabyte. Inexpensive ATA disk opens storage options that were never possible with higher-priced disk.
ATA disk arrays have created a market for several new software storage products that change the backup equation. Everything from disk-to-disk-to-tape systems to real-time protection systems have recently come onto the market. Some of these new products allow you to change how you protect your company's data.
I'm dividing the products mentioned in this article into two categories: "enhanced traditional backup" and "new ideas." Enhanced traditional backup products perform traditional backup and recovery in a somewhat non-traditional way. Either they simply enhance it with disk or they perform incremental backups forever--forgoing occasional full backups. "New ideas" products approach the backup problem in very different ways.
Traditional backup is the way backups have been done for years: an occasional full backup to tape, followed by daily incremental backups to tape. However, such a system is fraught with problems.
Most backup systems can be enhanced by simply placing disk storage in front of tape storage. These systems use tape storage as a way to create backups for off-site purposes, but completely forgo the creation of tape for on-site purposes. Products to use in this scenario come in two flavors: file system-based devices and virtual tape products.
A file system device is simply a large disk array with a file system; your backup software writes to this file system. Each backup creates a backup file in the file system. These backups can also be duplicated to tape. While any disk array can be used for this, it's most common to use ATA-based arrays for this purpose. Copan Systems, in Longmont, CO, and Nexsan Technologies, in Woodland Hills, CA, are companies to watch in this space.
A virtual tape system is a bit more complicated and interesting. Several companies have built disk arrays that pretend to be tape drives. This allows you to continue to do backups the way you're used to, but with all the advantages of disk. Of course, tape-based backups need to be converted to real tape for off-site storage. Vendors to watch here include Advanced Digital Information Corp. (ADIC), Alacritus Software, FalconStor Software and Quantum Corp.
The major benefit of these products is that restores are quicker and easier. Whether you're using a file system device or a virtual tape system, you can perform full backups less frequently, as it doesn't increase your recovery time the way it does if you did this with tape. Restoring from a three-month-old full backup and 90 days of incremental backups takes no longer than restoring from a full backup done yesterday--if all those backups are on disk.
Anyone who is familiar with IBM Corp.'s Tivoli Storage Manager (TSM) would read the first section of this article and say, "We don't perform regular full backups!" That's right, TSM users don't do that. And apparently Veritas Software Corp.'s NetBackup 5.0 users might not need to do it either.
The problem with never doing full backups is that you'll need hundreds of tapes to restore an individual system. TSM and NetBackup deal with this in two ways. The first is by creating new full backups from older full backups. Why go back to the original system and move a file across the network to create a new full, when you can simply move it from tape to tape? (See "Saving duplicate data once".) TSM calls this reclamation; NetBackup calls it synthetic full backups.
|Three ways to use disk for backup|
Another way to reduce the effects of forever incremental (or progressive incremental, as TSM calls it) is to keep the backups of a given tape together. If you mix the backups of several different systems onto a set of tapes, you increase the number of tapes that must be loaded in order to restore each system. However, if you tell each system's backups to stay together, you minimize the number of tapes needed to restore each system.
NetBackup users could create a pool for each critical system. TSM users can turn on collocation. Just remember that either of these procedures requires at least one tape per system. However, you could specify several smaller pools of tapes, and point subsets of clients to each pool. That would reduce the effects of forever incremental without having to go to the extreme of complete collocation. You could also mix and match these methods based on the criticality of a given client.
It should be mentioned that incremental forever techniques, when used with a traditional backup system, currently only work with traditional file system backups. Databases still require the occasional full backup. Only replication-based backup systems can perform incremental-forever backups of databases.
Although many of the following disruptive technologies can be integrated into a traditional backup system, many of them could completely replace an older backup system. In other words, these technologies are turning the backup world upside down.
One example is replication-based backup. If you were to look at the typical data protection hierarchy diagram, it would start with backups, followed by mirroring and RAID, high availability and finally replication. Replication used to be the thing you did once you've done everything else. With replication-based backup, that's no longer the case. The big advantage to replication-based backup systems is that they usually use block-level incremental backups to constantly maintain a full native-format copy of the data on the backup system. There is no need to perform full backups again.
One disadvantage is that if data is fully replicated, logical corruption--such as the deletion of a file--is replicated as well. The system needs to be able to create and maintain states of the replicated data. This can be done via copy-on-write snapshots, by logging or by backing up the replicated data using a traditional backup system.
Replication-based backup comes in a variety of flavors, and is being provided by a number of companies that can be divided into three groups. There is storage-based replication (see "Storage and host-based replication"), such as EMC Corp.'s SRDF and Network Appliance Inc.'s SnapVault. The largest group contains host-based replication products, such as Veritas' Volume Replicator or NSI Software's Double-Take. A newer group contains independent products that are trying to combine the features of both (See "Replication-based, disk-based backup system").
Another difference with the implementation in "Storage and host-based replication" is that the tape-based backup of the replicated system is replaced with another replicated system in an off-site location. If this system is also maintaining state data using snapshots or logging, there's now an on- and off-site backup without tape. The only reason to use tape here is for archiving.
In order to consider a product a replication-based product, the data must remain in its native format for at least one leg of the system. However, there is another type of product that is similar, but does not maintain data in its native format. Storactive Inc. provides real-time backup (with logging) of applications such as Microsoft Exchange. While the backup system does not maintain the data in its native format, it provides constant, replication-type backup of Exchange without ever performing a full backup again.
Another very interesting product area contains those products that recognize that all data consists of blocks of ones and zeros, many of which are replicated throughout your environment. In various ways, these products treat each file (or sometimes a block) as a backup object. The big advantage of these products is that if a particular object has been backed up before, it doesn't need to be backed up again. This is also referred to as single instance store. Of course, a true single instance store system would never need to perform a full backup again, as most of the files that would be backed up in a full backup already reside on the backup system.
|Block-based single instance store|
A block-based single instance store stores only unique blocks or files in the backup system.
File-based single instance store systems compare files of similar names to ensure that they are the same, and only store one copy of a given file on the backup system. Block-based systems actually look inside a file, and store only unique blocks on the backup system (see "Block-based single instance store"). File 1 consists of blocks a, b, c and d. When it is backed up, all of these blocks are new, so they are transferred and stored by the backup system. However, when it backs up file 2, file 2 also contains a block that is identical to block d in file 1. Blocks e and f are new, so they are transferred to the backup system. However, block d is not sent again. The backup system only takes note that block d resides somewhere else as well. Since all blocks are simply patterns of ones and zeros, block d could be inside any two (or more) files, regardless of file, application or operating system type.
There are some systems that are designed from the ground up to provide file or block-level single instance store, such as Avamar or Connected. However, such functionality is starting to creep into other products such as TSM and NetBackup.
All of the backup systems mentioned in this article are available today. (There are many companies that provide similar systems that are not listed in this article due to reasons of space. Check the software directory at http://www.storagemountain.com/software-directory.html for a comprehensive, up-to-date listing of such products.) Whether or not any of them are right for your storage environment will depend on your particular application, and how far along you fall on the adoption curve. Some environments prefer systems that are tried-and-true; others prefer cutting-edge technology. The choice, along with the advantages and disadvantages, is up to you.