Protecting data against loss, corruption, disasters (manmade or natural) and other problems is one of the top priorities...
for IT organizations. In concept, the ideas are simple, although implementing an efficient and effective set of backup operations can be difficult.
The term backup has become synonymous with data protection over the past several decades and may be accomplished via several methods. Backup software applications have been developed to reduce the complexity of performing backup and recovery operations. Backing up data is only one part of a disaster protection plan, and may not provide the level of data and disaster recovery capabilities desired without careful design and testing.
The purpose of most backups is to create a copy of data so that a particular file or application may be restored after data is lost, corrupted, deleted or a disaster strikes. Thus, backup is not the goal, but rather it is one means to accomplish the goal of protecting data. Testing backups is just as important as backing up and restoring data. Again, the point of backing up data is to enable restoration of data at a later point in time. Without periodic testing, it is impossible to guarantee that the goal of protecting data is being met.
Backing up data is sometimes confused with archiving data, although these operations are different. A backup is a secondary copy of data used for data protection. In contrast, an archive is the primary data, which is moved to a less-expensive type of media (such as tape) for long-term, low-cost storage.
Backup applications have long offered several types of backup operations. The most common backup types are a full backup, incremental backup and differential backup. Other backup types include synthetic full backups, mirroring, reverse incremental and continuous data protection (CDP).
The most basic and complete type of backup operation is a full backup. As the name implies, this type of backup makes a copy of all data to another set of media, which can be tape, disk or a DVD or CD. The primary advantage to performing a full backup during every operation is that a complete copy of all data is available with a single set of media. This results in a minimal time to restore data, a metric known as a recovery time objective (RTO). However, the disadvantages are that it takes longer to perform a full backup than other types (sometimes by a factor of 10 or more), and it requires more storage space.
Thus, full backups are typically run only periodically. Data centers that have a small amount of data (or critical applications) may choose to run a full backup daily, or even more often in some cases. Typically, backup operations employ a full backup in combination with either incremental or differential backups.
An incremental backup operation will result in copying only the data that has changed since the last backup operation of any type. The modified time stamp on files is typically used and compared to the time stamp of the last backup. Backup applications track and record the date and time that backup operations occur in order to track files modified since these operations.
Because an incremental backup will only copy data since the last backup of any type, it may be run as often as desired, with only the most recent changes stored. The benefit of an incremental backup is that they copy a smaller amount of data than a full. Thus, these operations will complete faster, and require less media to store the backup.
A differential backup operation is similar to an incremental the first time it is performed, in that it will copy all data changed from the previous backup. However, each time it is run afterwards, it will continue to copy all data changed since the previous full backup. Thus, it will store more data than an incremental on subsequent operations, although typically far less than a full backup. Moreover, differential backups require more space and time to complete than incremental backups, although less than full backups.
Table 1: A comparison of different backup operations
|Backup 1||All data||--||--|
|Backup 2||All data||Changes from backup 1||Changes from backup 1|
|Backup 3||All data||Changes from backup 2||Changes from backup 1|
|Backup 4||All data||Changes from backup 3||Changes from backup 1|
As shown in "Table 1: A comparison of different backup operations," each type of backup works differently. A full backup must be performed at least once. Afterwards, it is possible to run either another full, an incremental or a differential backup. The first partial backup performed, either a differential or incremental will back up the same data. By the third backup operation, the data that is backed up with an incremental is limited to the changes since the last incremental. In comparison, the third backup with a differential backup will backup all changes since the first full backup, which was backup 1.
From these three primary types of backup types it is possible to develop an approach to protecting data. Typically one of the following approaches is used:
- Full daily
- Full weekly + Differential daily
- Full weekly + Incremental daily
Many considerations will affect the choice of the optimal backup strategy. Typically, each alternative and strategy choice involves making tradeoffs between performance, data protection levels, total amount of data retained and cost. In "Table 2: A backup strategy's impact on space" below, the media capacity requirements and media required for recovery are shown for three typical backup strategies. These calculations presume 20 TB of total data, with 5% of the data changing daily, and no increase in total storage during the period. The calculations are based on 22 working days in a month and a one month retention period for data.
Table 2: A backup strategy's impact on space
|Common backup scenarios||Media Space Required for one Month (20 TB @ 5% daily rate of change)||Media required for recovery|
|Full daily (weekdays)||Space for 22 daily fulls (22 * 20 TB) = 440.00 TB
||Most recent backup only|
|Full (weekly) + Differential (weekdays)||Fulls, plus most recent differential since full
(5 * 20 TB) + (22 * 5%* 20 TB) = 124.23 TB
|Most recent full + most recent differential|
|Full (weekly) + Incremental (weekdays)||Fulls, plus all incrementals since weekly full
(5 * 20 TB) + (22 * 5% * 20 TB) = 122.00 TB
|Most recent full + all incrementals since full|
As shown above, performing a full backup daily requires the most amount of space, and will also take the most amount of time. However, more total copies of data are available, and fewer pieces of media are required to perform a restore operation. As a result, implementing this backup policy has a higher tolerance to disasters, and provides the least time to restore, since any piece of data required will be located on at most one backup set.
As an alternative, performing a full backup weekly, coupled with running incremental backups daily will deliver the shortest backup time during weekdays and use the least amount of storage space. However, there are fewer copies of data available and restore time is the longest, since it may be required to utilize six sets of media to recover the information needed. If data is needed from data backed up on Wednesday, the Sunday full backup, plus the Monday, Tuesday and Wednesday incremental media sets are required. This can dramatically increase recovery times, and requires that each media set work properly; a failure in one backup set can impact the entire restoration.
Running a weekly full backup plus daily differential backups delivers results in between the other alternatives. Namely, more backup media sets are required to restore than with a daily full policy, although less than with a daily incremental policy. Also, the restore time is less than using daily Incrementals, and more than daily fulls. In order to restore data from a particular day, at most two media sets are required, diminishing the time needed to recover and the potential for problems with an unreadable backup set.
For organizations with small data sets, running a daily full backup provides a high level of protection without much additional storage space costs. Larger organizations or those with more data find that running a weekly full Backup, coupled with either daily incrementals or differentials provides a better option. Using differentials provides a higher level of data protection with less restore time for most scenarios with a small increase in storage capacity. For this reason, using a strategy of weekly full backups with daily differential backups is a good option for many organizations.
Most of the advanced backup options such as synthetic full, mirror, reverse incremental and CDP require disk storage as the backup target. A synthetic full simply reconstructs the full backup image using all required incrementals or the differential on disk. This synthetic full may then be stored to tape for offsite storage, with the advantage being reduced restoration time. Mirroring is copying of disk storage to another set of disk storage, with reverse incrementals used to add incremental type of backup support. Finally, CDP allows a greater number of restoration points than traditional backup options.
When deciding which type of backup strategy to use the question is not what type of backup to use, but when to use each, and how these options should be combined with testing to meet the overall business cost, performance and availability goals.
About the author: Russ Fellows is a senior analyst with Evaluator Group. He is responsible for leading research and analysis of product and market trends for NAS, virtual tape libraries and storage security. He is also the primary analyst for coverage of selected open-systems arrays and virtualization products.