Tape backup failures
What are the major causes of tape backup failures?

    Requires Free Membership to View

    When you register for SearchDataBackup.com, you’ll also receive targeted emails from my team of award-winning editorial writers. Because your job never seems to get any easier, it’s our goal to keep you up-to-date on the latest backup tips, trends and technologies that will help you get the job done.

    Rich Castagna, Editorial Director

    By submitting your registration information to SearchDataBackup.com you agree to receive email communications from TechTarget and TechTarget partners. We encourage you to read our Privacy Policy which contains important disclosures about how we collect and use your registration and other information. If you reside outside of the United States, by submitting this registration information you consent to having your personal data transferred to and processed in the United States. Your use of SearchDataBackup.com is governed by our Terms of Use. You may contact us at webmaster@TechTarget.com.

There are a wide variety of reasons why data backup or recovery operations fail. Backup and restore operations may fail due to people, hardware, software or media errors. There are essentially six major causes of backup and restore failures:

  1. Human errors
  2. Software errors (device busy, in use or other error conditions that may be recoverable)
  3. Resource contention (resources needed for backup in use such as an open application or file data, I/O channels, or tape library, drive or media)
  4. SAN or other connectivity problems
  5. Tape drive or media error
  6. Hardware errors (disk drives, processors, memory, I/O controllers, etc.)
The most common cause of backup and recovery problems are those caused by human errors. Examples include running multiple jobs simultaneously, failure to change tapes, forgetting to remove a cleaning tape or any one of a number of issues.

Next on the list are software errors, typically with scripts used to run backup operations, mount media or other operations. Resource contention can occur due to unforeseen or unpredictable events, or because of a lack of planning on the part of backup operators. Connectivity problems may occur due to SAN zone changes or other issues effecting connectivity. Finally, hardware-related items are the least likely to cause problems, which include tape media, tape drives and other hardware involved.

The perception portrayed by some vendors is that tape media is one of the most common causes of backup problems. The reality is that tape drive and media issues are typically not the major cause of failures. However, tape media failures do occur and can have a significant impact; in some cases tape media errors aren't recoverable.

Applications and tape drives may be unable to recover from media errors, even though the actual data loss is a small percentage of the tape. Thus, even small dropouts may result in loss of access to a large amount of data on tape. Moving from tape- to disk-based backups can improve reliability and eliminate this data loss. However, the other more common problems remain, regardless of whether backups are going to disk or tape.

Addressing the causes of backup failures is prudent. But the issues should be addressed in the order of frequency. Because human errors are the most likely cause for problems, automating backup processes can reduce a large amount of errors. Ensuring well-proven software and scripts are used to run backup operations is the next issue to address. Configuration management can help ensure that resource contention and connectivity issues don't lead to data backup problems. Only after the primary causes of backup have been solved does it make sense to address tape media problems for backups.

This was first published in June 2008