Troubleshooting automated tape libraries

Between shared libraries, LAN free backups, HBAs, drivers and firmware, many things can cause an automated tape library to function unpredictably. Learn how to troubleshoot automated tape library problems.

What you will learn: Between shared libraries, LAN free backups, HBAs, drivers and firmware, many things can cause an automated tape library to function unpredictably. Learn how to troubleshoot automated tape library problems.

Let's assume that the most basic trouble areas have been verified and eliminated as probable causes -- there is power in the server room, the automated tape library (ATL) is connected to a power outlet, it is switched on and is hooked up to the backup server. Once this has been established, we can start looking at other possibilities.

What has changed? This should be the first thing to ask yourself to determine whether you are experiencing a hardware failure or some type of configuration issue. If no changes were made and a problem surfaced, the system's error log should be reviewed for possible error messages. Enterprise class tape libraries also have error logging capabilities, and good backup software will usually log device-type errors. This type of failure is usually resolved through hardware vendor support.

Bad drive or bad tape?

Before assuming a drive is down or a tape is bad, the tape in question should be mounted in another drive and/or a tape known to be good should be read from the suspected faulty drive.

Drivers and firmware

Upgrading to a newer version of a tape device driver can sometimes conflict with the firmware on a tape device. The vendor's "read-me" should actually be read for the latest information.

SCSI libraries

For those installations still using SCSI devices, a number of items should be verified, such as:

  • Conflicting SCSI ID at the device level. Each device and the initiator (SCSI card) on a chain should have a unique ID.

  • Faulty SCSI terminator or cable. Disconnect all devices and try only one with a new terminator and cable.

  • Cabling exceeds maximum length.
  • Fibre Channel libraries

    Fibre Channel attached libraries add new complexity and potential for configuration errors, such as:

  • Zoning errors at the Fibre Channel switch level -- backup server not zoned to see the tape drives.

  • World wide name (WWN) in the configuration not matching the tape device's actual WWN.

  • Firmware and driver compatibility issues between operating system, software, host bus adapter (HBA), switch and tape device -- consult the read-me file.

  • Hardware compatibility issues -- consult the vendor's support matrix.
  • Control path

    Some libraries use one of the tape drive connections to issue commands to the robotic arm (gripper) to mount or eject tape media. This is usually configured from the library menu and the selected tape drive number must be matched when configuring the devices to the backup server. For example, if the first drive is selected as the control path (i.e., /dev/rmt0), the control device number should match (i.e., /dev/smc0).

    Element numbers

    Element numbers are used to identify specific physical library components. Each tape slot and drive has an assigned element number. When configuring tape drives to a backup server, the element number must match the logical device defined to the operating system to prevent the robotics from loading a tape in one physical drive (element #) and the operating system trying to write to another drive (logical device). ACSLS libraries use drive ID numbers, which is a similar concept.

    Utilities

    Some vendors supply utilities to test ATLs (i.e., IBM tapeutil and Veritas robtest). These utilities come in handy to isolate certain configuration errors that might be difficult to diagnose at the backup software level. Such utilities allow you to view the device configuration at the operating system level and issue basic commands, such as mount, rewind, unload and eject. Unix operating systems also allow some basic communication with devices using commands, such as DD and IOCTL that can help determine if device is operational or simply configured incorrectly to the backup software.

    As with any other hardware issues, a methodical approach must be taken when troubleshooting ATLs:

  • Changes or tests should be made one at a time and validated before moving on to the next one.

  • Consult the product documentation -- yes you can!

  • Contact vendor support.
  • About the author: Pierre Dorion is a certified business continuity professional for Mainland Information Systems Inc.


    This was first published in October 2007
    This Content Component encountered an error

    Pro+

    Features

    Enjoy the benefits of Pro+ membership, learn more and join.

    0 comments

    Oldest 

    Forgot Password?

    No problem! Submit your e-mail address below. We'll send you an email containing your password.

    Your password has been sent to:

    -ADS BY GOOGLE

    SearchSolidStateStorage

    SearchVirtualStorage

    SearchCloudStorage

    SearchDisasterRecovery

    SearchStorage

    SearchITChannel

    Close