What you will learn in this tip: Learn how to troubleshoot the most common NDMP backup errors, and what you can do to prevent them.
However, as its name states, NDMP is a networking protocol, and as with any other protocol, errors can occur. Unfortunately, troubleshooting NDMP errors can be a little bit tricky. When NDMP was originally drafted in 1996, the NDMP draft described a number of different error messages that could be returned under various conditions.
The list below briefly describes all of the errors that the original version of NDMP could potentially return, and what they mean:
NDMP backup errors
NDMP_DEVICE_BUSY_ERR -- The specified tape drive is already in use.
NDMP_DEVICE_OPENED_ERR -- NDMP is attempting to open more connections than are allowed.
NDMP_NOT_AUTHORIZED_ERR -- This error is returned if an NDMP request is issued before the connection has been authenticated.
NDMP_PERMISSIONS_ERR -- The connection has been authenticated, but the credentials used lack the necessary permissions.
NDMP_DEV_NOT_OPEN_ERR -- An attempt was made to access a device without first opening a connection to it.
NDMP_IO_ERR -- The tape drive has returned an I/O error.
NDMP_TIMEOUT_ERR -- The current operation has timed out.
NDMP_ILLEGAL_ARGS_ERR -- The request contains an illegal argument.
NDMP_NO_TAPE_LOADED_ERR -- There is no tape in the drive.
NDMP_WRITE_PROTECT_ERR -- The tape in the drive is write protected.
NDMP_EOF_ERR -- An unexpected end of file was encountered.
NDMP_EOM_ERR -- The tape has run out of space (the End of Media Mark was encountered).
NDMP_FILE_NOT_FOUND_ERR -- The requested file was not found.
NDMP_BAD_FILE_ERR -- An error was caused by a bad file descriptor.
NDMP_NO_DEVICE_ERR -- A request was made to a tape drive that does not exist.
NDMP_NO_BUS_ERR -- The specified SCSI bus cannot be found.
NDMP_NOT_SUPPORTED_ERR -- Either the NDMP protocol is not supported, or only a subset of the protocol is supported.
NDMP_XDR_DECODE_ERR -- A message cannot be decoded.
NDMP_ILLEGAL_STATE_ERR -- A request cannot be processed in its current state.
NDMP_UNDEFINED_ERR -- A nonspecific error has occurred.
NDMP_XDR_ENCODE_ERR -- There was an error encoding a reply message.
NDMP_NO_MEM_ERR -- A memory allocation error.
As you can see, most of the error messages listed above are fairly straightforward. However, a lot has changed since 1996. Over the last 14 years, the NDMP protocol has evolved and some backup software vendors have even made their own proprietary changes to the protocol. As if that doesn't make troubleshooting tricky enough, some backup vendors mask the actual NDMP error codes and replace them with their own error messages.
One of the best examples of this is Symantec Corp.'s NetBackup (and Backup Exec), which conveys NDMP errors through numeric codes. The most common of these error codes are 16 and 99.
NDMP error 16
Error 16 is the equivalent to the NDMP_NO_DEVICE_ERR error message, and is sometimes expressed as NDMP_NO_DEVICE_ERR(16). Essentially, this error means that the backup software is having trouble communicating with the tape drive (or a disk).
While an Error 16 can be caused by a hardware error (faulty tape drive, power failure, bad SCSI controller, etc.), the hardware isn't always to blame. When this type of error occurs, your best bet is to reboot the machine on which the backup software is running. Of course this isn't always practical.
If rebooting isn't an option, then the next step is to reset the software components that are involved in the problem. Remember that NDMP is a cross-platform protocol, so the steps that you use will depend on the platform and on the backup software that you are using. For example, if you were running Backup Exec on a NetWare server, then you would stop and then unload the Backup Exec software. After that, you would verify that the SCSI drivers are not controlling the server's disk storage. From there, you would unload and reload the SCSI drivers, and then reload the Backup Exec software. On its website, Novell provides documentation of the full procedure for fixing the NDMP error 16.
Keep in mind that the procedure I just outlined is more of a workaround than a fix. Reloading the SCSI drivers and the backup software will usually allow you to access the tape drive (assuming that there are no hardware issues), but it doesn't actually address the true cause of the problem.
If you experience this issue on a regular basis, then you should check your documentation to investigate adjusting the timeout thresholds. SCSI timeouts and excessive network latency have both been known to cause Error 16 under certain circumstances.
NDMP error 99
Error 99 can be a bit more difficult to troubleshoot. It is a generic error that simply reflects the fact that the backup has failed. In most cases, Error 99 will occur if the backup software has trouble accessing the files that need to be backed up (because of a path error, not a permissions error). This type of error can also occur if an incremental backup is being performed, but no files have changed since the previous backup. If you happen to be running the backup software on a Unix system, then you might sometimes receive this error as a result of an invalid localhost entry in the /etc/hosts file.
If you experience Error 99, you should check your backup software's logs for clues as to the source of the problem. Remember that while several potential causes of this error are described here, Error 99 is a generic error message and could potentially mean just about anything. As such, your logs are the best place to begin gathering clues as to the nature of the error.
In this article, I have explained that like any other protocol, NDMP occasionally reports errors. The problem with NDMP errors is that they are often oversimplified, or the backup software may mask the actual error message. As such, you may have to check your server's logs or consult with your backup vendor for additional information when errors occur.
About the author: Brien M. Posey, MCSE, has previously received Microsoft's MVP award for Exchange Server, Windows Server and Internet Information Server (IIS). Brien has served as CIO for a nationwide chain of hospitals and was once responsible for the Department of Information Management at Fort Knox. You can visit Brien's personal website at www.brienposey.com.
This was first published in July 2010