Protecting data against loss, corruption, disasters (human-caused or natural) and other problems is one of the top priorities for IT organizations. In concept, the ideas are simple, although implementing an efficient and effective set of backup operations can be difficult.
The term backup has become synonymous with data protection over the past several decades and may be accomplished via several methods. Backup software applications reduce the complexity of performing backup and recovery operations. Backing up data is only one part of a disaster protection plan, and may not provide the level of data and disaster recovery capabilities desired without careful design and testing.
Backup applications have long offered several types of backup operations. The most common backup types are a full backup, incremental backup and differential backup. Other backup types include synthetic full backups and mirroring.
In the debate over cloud vs. local backup, there are some types of backup that are better in certain locations. If you're performing cloud backup, incremental backups are generally a better fit because they consume fewer resources. You might start out with a full backup in the cloud and then shift to incremental backups. Mirror backup, though, is typically more of an on-premises approach and often involves disks.
The most basic and complete type of backup operation is a full backup. As the name implies, this type of backup makes a copy of all data to another set of media, such as a disk or tape. The primary advantage to performing a full backup during every operation is that a complete copy of all data is available with a single set of media. This results in a minimal time to restore data, a metric known as a recovery time objective. However, the disadvantages are that it takes longer to perform a full backup than other types (sometimes by a factor of 10 or more), and it requires more storage space.
Thus, full backups are typically run only periodically. Data centers that have a small amount of data (or critical applications) may choose to run a full backup daily, or even more often in some cases. Typically, backup operations employ a full backup in combination with either incremental or differential backups.
An incremental backup operation will result in copying only the data that has changed since the last backup operation of any type. An organization typically uses the modified time stamp on files and compares it to the time stamp of the last backup. Backup applications track and record the date and time that backup operations occur in order to track files modified since these operations.
Because an incremental backup will only copy data since the last backup of any type, an organization may run it as often as desired, with only the most recent changes stored. The benefit of an incremental backup is that it copies a smaller amount of data than a full. Thus, these operations will complete faster, and require less media to store the backup.
A differential backup operation is similar to an incremental the first time it is performed, in that it will copy all data changed from the previous backup. However, each time it is run afterwards, it will continue to copy all data changed since the previous full backup. Thus, it will store more data than an incremental on subsequent operations, although typically far less than a full backup. Moreover, differential backups require more space and time to complete than incremental backups, although less than full backups.
As shown in "A comparison of different types of backup," above, each process works differently. An organization must run a full backup at least once. Afterwards, it is possible to run either another full, an incremental or a differential backup. The first partial backup performed, either a differential or incremental, will back up the same data. By the third backup operation, the data that is backed up with an incremental is limited to the changes since the last incremental. In comparison, the third backup with a differential will back up all changes since the first full backup, which was "Backup 1."
From these three primary types of backup, it is possible to develop an approach for comprehensive data protection. An organization often uses one of the following approaches:
- Full daily
- Full weekly + differential daily
- Full weekly + incremental daily
Many considerations will affect the choice of the optimal backup strategy. Typically, each alternative and strategy choice involves making tradeoffs between performance, data protection levels, total amount of data retained and cost. In "A backup strategy's impact on space" below, the media capacity requirements and media required for recovery are shown for three typical backup strategies. These calculations presume 20 TB of total data, with 5% of the data changing daily, and no increase in total storage during the period. The calculations are based on 22 working days in a month and a one-month retention period for data.
As shown above, performing a full backup daily requires the most amount of space, and will also take the most amount of time. However, more total copies of data are available, and fewer pieces of media are required to perform a restore operation. As a result, implementing this backup policy has a higher tolerance to disasters, and provides the least time to restore, since any piece of data required will be located on at most one backup set.
As an alternative, performing a full backup weekly, coupled with running incremental backups daily, will deliver the shortest backup time during weekdays and use the least amount of storage space. However, there are fewer copies of data available and restore time is the longest, since an organization may need to use six sets of media to recover the necessary information. If data is needed from data backed up on Wednesday, the Sunday full backup, plus the Monday, Tuesday and Wednesday incremental media sets, are required. This can dramatically increase recovery times, and requires that each media set work properly; a failure in one backup set can impact the entire restoration.
Running a weekly full backup plus daily differential backups delivers results in between the other alternatives. Namely, more backup media sets are required to restore than with a daily full policy, although less than with a daily incremental policy. Also, the restore time is less than using daily incremental backups, and more than daily full backups. In order to restore data from a particular day, at most two media sets are required, diminishing the time needed to recover and the potential for problems with an unreadable backup set.
A mirror backup is comparable to a full backup. According to a blog from backup vendor Nakivo, "This backup type creates an exact copy of the source data set, but only the latest data version is stored in the backup repository with no track of different versions of the files." The backup is a mirror of the source data, thus the name. All the different backed up files are stored separately, like they are in the source.
One of the benefits of mirror backup is a fast recovery time. It's also easy to access individual backed up files.
One of the main drawbacks, though, is the amount of storage space required. With that extra storage, organizations should be wary of cost increases and maintenance needs. In addition, if there's a problem in the source data set, such as a corruption or deletion, the mirror backup experiences the same. As a result, it's a good idea not to rely on mirror backups for all your data protection needs, and to have other types of backup for the data. You'll want to follow the 3-2-1 rule of backup, which includes three copies of data on two different media, with one copy off site.
One specific kind of mirror, disk mirroring, is also known as RAID 1. This process replicates data to two or more disks. Disk mirroring is a strong option for data that needs high availability because of its quick recovery time. It's also helpful for disaster recovery because of its immediate failover capability. Disk mirroring requires at least two physical drives. If one drive fails, an organization can use the mirror copy. While disk mirroring offers comprehensive data protection, it requires a lot of storage capacity.
Do the right thing for your organization
For organizations with small data sets, running a daily full backup provides a high level of protection without much additional storage space costs. Larger organizations or those with more data or server volume find that running a weekly full backup, coupled with either daily incremental backups or differential backups, provides a better option. Using differentials provides a higher level of data protection with less restore time for most scenarios and a small increase in storage capacity. For this reason, using a strategy of weekly full backups with daily differential backups is a good option for many organizations.
Most of the advanced types of backup such as synthetic full, mirror and continuous data protection require disk storage as the backup target. A synthetic full simply reconstructs the full backup image using all required incremental backups or the differential backup on disk. This synthetic full may then be stored to tape for offsite storage, with the advantage being reduced restoration time. Finally, continuous data protection enables a greater number of restoration points than traditional backup options.
When deciding which type of backup strategy to use, the question is when to use each, and how these options should be combined with testing to meet the overall business cost, performance and availability goals.
The purpose of most backups is to create a copy of data so that a particular file or application may be restored after data loss, corruption or deletion, or a disaster strikes. Thus, backup is not the goal, but rather it is one means to accomplish the goal of protecting data. Testing backups is just as important as backing up and restoring data. Again, the point of backing up data is to enable restoration of data at a later point in time. Without periodic testing, it is impossible to guarantee that the goal of protecting data is being met.