Like many storage/backup administrators, one of the first things that I do in the morning is to change backup tapes...
and verify that my backup ran correctly the night before. Although backups tend to be a lot more reliable than they used to be, there is always that chance that you may come into the office one morning only to discover that there was a problem with the backup. If that happens, then the obvious question becomes what do you do about it. Although it might be tempting to jump right into the backup logs, you don't have to do so blindly. Knowing what kinds of things to look in backup logs for can decrease the amount of time that it takes you to fix the problem.
Unfortunately, I can't tell you the exact error messages that you should be looking for in your backup logs. Every data backup application uses its own set of backup error messages or error codes. That being the case, I am going to be using generic error messages in this article. Even so, my generic error messages should be somewhat similar to the ones that you would find in a real-life backup log.
Backup error: Files are missing
For a storage/backup administrator, few things are as frustrating or as embarrassing as having a user ask you to restore a file, and then finding out that the file was not backed up. If possible, you need to find about any important files that are being excluded from your backup before someone asks you to restore them.
It is fairly common for backup applications to report that one or more files have been skipped. Most of the time this error message isn't really an error at all. It is simply generated because some system files (such as pagefile.sys) can't be backed up. Even so, it is important to get into the habit of actually looking to see which, if any files have been skipped.
So what do you do if you find out that some important files have been omitted from the backup? There are several things that you can check. Initially, I recommend checking your backup software just to make absolutely sure that no exclusions have accidentally been set for the missing files. Once you have done that, it's time to dig into the backup logs.
There are several different error messages that you can look for in the logs that may point to the source of the problem. Start out by looking for "access denied" messages. It could be that the backup application does not have the necessary permissions to access the file or folder in question.
You should also look for any messages about the file being left open. If you are using an older backup application that does not rely on a VSS writer, then open files will most likely be skipped. You may find that a user stayed logged in all night, thus preventing the file from being backed up. While you are at it, you should also look for indications that the file might have been deleted before it could be backed up.
On a different note, I have been in situations in which a user asked me to restore a file, and it wasn't on the backup tape, but the backup logs didn't make any mention of the file either. In these particular situations, the users in question were either misspelling the filename, or they directed me to the wrong path for the file. These types of problems are becoming even more common as Microsoft Distributed File System (DFS) and other file system virtualization products are being more heavily used since they obscure the actual path where the file resides. In those types of situations, you can try doing a volume search, or you can ask the user to show you how they normally get to the file.
An entire volume fails to be backed up
Another type of error that I have encountered is a situation in which an entire volume was not backed up. These types of problems are usually fairly easy to troubleshoot. Every backup application is different, but usually you will see a message on the backup console screen that says something like "Failed to back up server q" or something similar.
The first thing that I recommend looking for in the backup logs are network errors. The backup software can't backup a network volume if it can't communicate with it. Network errors can be caused by connectivity problems (such as those caused by a bad NIC), or they may be caused by a firewall that blocks the port that's being used by the backup agent.
You might also check for I/O errors. I've had situations in which a network volume was skipped because my backup server had trouble communicating with the tape drive. This situation initially gave the illusion of being a network problem, even though the problem was actually caused by a bad SCSI controller.
File verification errors in backup logs
In older backup applications that do not use VSS writers, it is common to receive file verification errors. Sometimes these verification errors are no big deal. For instance, someone may modify a file between the time that it is backed up and the time that it is verified.
If you are concerned about some verification errors that you have received, then check your backup logs for Cyclic Redundancy Check (CRC) errors. CRC errors are usually a good indication that the data was not written to the tape correctly. Over the years, I have seen Cyclic Redundancy Check errors caused by bad tapes, dirty tape drive heads, and bad SCSI cables.
Every time that I have ever run into a Cyclic Redundancy Check error on a tape drive, the log clearly listed what was affected by the error. However, I don't think that I have ever seen a real-world situation where only one file was corrupted by a CRC error. These types of errors usually corrupt large numbers of files (on the tape). All of the tape drive CRC errors that I have seen were caused by hardware issues.
An application failed to be backed up
Application-level backups are tricky because every application is different. If you have one particular application that you are having trouble backing up, and you have verified that connectivity to the server that's running the application is not an issue, then try looking for any security related errors. For example, SQL Server databases can be set to use either Windows authentication or server authentication. If the backup software is configured to use the wrong authentication method, then the connection to the database may fail, resulting in an authentication mode error.
As you can see, whenever a backup fails, there is usually some clue to the cause of the failure in your backup logs. The trick to diagnosing the problem is knowing what types of log entries to look for in relation to the symptoms of the failure.
About the author: Brien M. Posey, MCSE, has previously received Microsoft's MVP award for Exchange Server, Windows Server and Internet Information Server (IIS). Brien has served as CIO for a nationwide chain of hospitals and was once responsible for the Department of Information Management at Fort Knox. You can visit Brien's personal website at www.brienposey.com.
Do you have comments on this tip? Let us know.
Please let others know how useful this tip was via the rating scale below. Do you know a helpful backup tip, timesaver or workaround? Email the editors to talk about writing for SearchDataBackup.com.