Modern data backup and recovery system considerations

In this book excerpt, Iearn about tape vs. disk storage, the difference between full versus incremental backup, and disaster recovery vs. backup.

The following is an excerpt from Foundation of Green IT: Consolidation, Virtualization, Efficiency, and ROI in the Data Center by Marty Poniatowski (Prentice Hall, 2009).

This excerpt discusses important technology considerations in modern data backup and recovery systems. Learn about tape and disk storage, and how to choose the right backup media for your shop. You'll also learn about the difference between full versus incremental backup, and disaster recovery versus backup in this excerpt.

This excerpt is from "Chapter 7: The existing backup and recovery environment."

Backup media

For the last 50 years, tape media has been used to hold data for the IT community. Media has gone through numerous technical improvements with massive gains in capacity and speed, but the underlying concept of a magnetically modified roll of chemically treated plastic tape remains in place. Over time, tape will degrade, and at some point, will begin to show I/O errors rendering that information inaccessible. This is a concern when compliance is a consideration because the integrity of the media must be in place for at least as long as compliance regulations require.

tape has the additional limitation of being sequential in operation, which means data is written on the tape in a serial manner. Tape can never be considered online even if a tape is mounted in the tape drive; the process of streaming to the proper location on the tape can require a substantial amount of time because this is a mechanical process.

A preferred method of backup is disk storage. Disk storage is used in the proposed backup solution in this book. Disk has many advantages, such as being online and orders of magnitude faster than tape.

The current disk backup solutions are intelligent storage arrays, which are fast, reliable, and dense. The flexibility these new arrays make them more reliable than tape and they do not suffer the major drawback of sequential reads and writes. Not only do intelligent storage arrays provide faster random access for both reads and writes, but they constantly scrub, analyze, and a monitor in a proactive way the disks and the data they contain.

You will see in the advanced backup solutions that disk technology is used as the recommended primary backup solution.

Full versus incremental backup

A full backup makes a copy of all the data on the storage devices pointed to by the backup process. If you run a full backup of 10 TB over three consecutive days, you back up a total of 30 TB of data on backup media. With an incremental backup, you back up only changed data after your first full backup. This results in substantially less than 30 TB being backed up.

Incremental backups come in two forms: There is a differential type of incremental backup and there is the cumulative type of incremental backup. A differential will back up any changed files since the full and the last differential backup. A cumulative type of incremental backup will back up any changed files since the last cumulative backup and also back up any changed files since the last full backup.

Normally, the cumulative backup will use up more resources during backup; however, restoring required files will be much easier and quicker.

Clone versus replication of data

As it relates to backup and recovery, a clone is a duplicate copy of a backup image. That backup image can be either on a tape or a disk. The original backup image can then be copied to the same type of media -- for instance, tape to tape, or disk to disk -- as well as different types of media, such as tape to disk or disk to tape. The creation and management of a clone is handled by the backup software.

In the case of replication, you usually deal with the copying of a predefined LUN or area of disk storage from one LUN to another LUN. This process is usually carried out on an intelligence storage array and managed by a web-based console that has access to the storage array. Replication in most cases is best used for disaster recovery; however, many organizations feel that they can use replication to replace backup. Although arguments can be made for this position, true recovery capability requires the ability to recover files from yesterday, last week, last month, and last year. In order for replication to provide this capability, the organization may have to make a large investment in disk storage.

Backup and recovery versus disaster recovery

As already discussed, there is a critical difference between what the organization uses for backup and recovery and what the same organization uses for disaster recovery.

Backup and recovery allows for restoring individual files and folders. Disaster recovery requires a defined and acceptable Recovery Time Objective (RTO) and Recovery Point Objective (RPO). It will be limited by the total amount of disk made available and the number of snapshots required by the SLA.

The new backup and recovery environment proposed in an upcoming chapter has disk libraries, used as the backup and recovery foundation, in two different locations. The units are replicated, which means that in the event of a disaster in one location, the disk library in the other location can be used as the backup or restore device. This is by no means a complete disaster recovery plan, so be sure you analyze all aspects of disaster recovery in your environment.

Distributed backup versus centralized backup

Early distributed forms of networking and storage required the need to install tape drives directly to the servers being backed up. Early network speeds also impacted the amount of data that could be transmitted in a given time window to perform a backup. When slow network speeds were the norm, it was a necessity in some cases to have local tape drives to backup clients because the slow network speeds would not support a centralized backup environment. This is true in the case study in which virtually all the clients have a local tape drive for backup.

In the distributed environment, the tape drives could only be accessed by a server directly attached to it. The tape resource was not able to be shared across multiple servers. With the advent of SAN fibre networks, storage resources could now be shared and would no longer be limited to a server with a direct attachment.

Although distributing the backup process may be an easy solution when first setting up backups, eventually the monitoring and management of the network as a whole will become difficult. There is no central reference point from which to monitor all the separate backup servers. Each backup server maintains its own catalog of data and tapes, and this information is not shared with any of the other backup servers in the organization's environment.

With the advent of 100 MB and faster speeds, the network bandwidth growth has allowed for backups of clients from a few centrally controlled backup servers, media servers, and storage nodes.

About this author: Mary Poniatowski, Chief Technology Officer at Computer Design and Integration, manages all aspects of strategic design for customers' enterprise infrastructure.

Dig Deeper on Disk-based backup