Many IT departments have seen the benefits of implementing an enterprise-class tape library. No longer do administrators or tape librarians have to shuffle throughout the data center looking for and documenting the receipt of backup tapes. In addition, we now have a better idea about the validity of the data on the tapes that are collected for archival.
|Connecting libraries to SANs: dos and don'ts|
Don't waste switch resources by using a port for each tape drive in the library.
Do conserve ports by using arbitrated loop switches to aggregate traffic from the fabric to the drives.
Of course, this is the end result of much planning, bumps, bruises and failed backups. Enterprise-class tape libraries are important to lowering the overall cost of storage management, software and hardware. However, to have any chance of achieving the ROI that you were promised by your vendors, a properly sized library with the correct tape format and library connectivity must be used.
When I am performing interviews with application owners regarding the longevity of their application's data, rarely is there any mention of a corporate-wide data policy, nor is there any solicitation of information from the application's users. Instead, what usually happens is the application owner looks up at the ceiling and spouts out some safe number that doesn't necessarily coincide with the corporation's goals of profitability or liability for that matter.
If you can, have a corporate-wide data retention policy in place and signed by the chief in charge. What's needed here is a thorough risk analysis that combines input from the application users, the people responsible for providing recovery services and the people whose fate depends on the viability of the corporation after data is lost. You may also benefit from an outside vendor to provide industry experience and expertise.
The data retention policy will allow you to achieve some predictability in your tape usage, and as a result, in your tape and tape drive needs. With a referable corporate-wide data retention policy as your shield, you can start notifying departments of the enforced retention policy as soon as a decision has been made for departments to share a tape library.
Of course you'll often find that one person in particular will want to have more influence on the changes taking place. For example, your retention policy might indicate that all developmental databases are backed up completely once a week, and incrementally on every other day. However, the Oracle DBA may feel safer if you did a full backup every day. Resist - as long as you have the retention policy on your side. Take the default stance of deny, but do it nicely.
Even a small change like that can create problems with backup schedules. Pretty soon, you're experiencing other failures because resources (tape drives) that were previously available at 3 a.m. on Tuesday are no longer available because a storage policy was violated, without taking into account the available resources and the needs of other clients backing up at the same time.
|Off-site backup provides better disaster recovery|
Once the retention policy has been implemented for a few quarters, and depending on the desired schedule of your tape vault vendor, you can determine how large of a tape library you'll need for your solution. Three quarters - including fiscal and calendar year end if possible - worth of trending data under the retention policy should give you some idea about the projected growth that your selected tape library must accommodate.
Tape drive selection
Determine whether your current media choice will provide your applications with the level of service it needs. For example, a backup and recovery solution will not necessarily use the same tape format as a hierarchical storage management (HSM) solution. You may have a couple of servers generating backup traffic to your tape library used for backups, and a couple hundred users generating read requests to the tape library supporting the HSM solution.
In that scenario, the library supporting the backup of a few servers may be best served by using DLT or LTO technologies. However, when those servers' data is migrated from disk to tape as a result of HSM, the larger user community could potentially overrun the HSM tape library because of the slow load times of the selected tape drive technology. Although it isn't desirable to support two tape formats, in some cases it may be necessary, depending on the traffic patterns of the applications involved.
One particular tape format that serves both ends of the spectrum well is Sony's AIT-3 tape drive. The AIT-3 tape drive has the highest capacity per square inch. Combining a native capacity of 100GB with a sustained transfer rate of 12MB/s and 27-second load time, covers most if not all of your application concerns, including cost when compared to some high-end tape solutions. The memory-in-cartridge feature also enhances the load time and file location on the tape, thereby providing your solution with an optimal condition for large numbers of tape mount requests such as in a complete disaster recovery, or HSM application. AIT's 3.5-inch form factor also provides the scalability you will need by allowing the greatest tape drive count in your chosen library.
|Creative disaster recovery|
By extending your storage area network (SAN), you may be able to locate your tape library in a way that improves your business continuance performance. A university that I worked with implemented a campus-wide SAN that included a secondary site four city blocks from the primary storage area. They installed an enterprise class tape library at their secondary site using native Fibre Channel (FC) connections end to end. To the backup servers located at the primary site, the tape library appeared as if it was directly connected and in the same room.
This allowed the university to backup their data immediately off-site, and leave the tapes in the library for a few days according to their storage policy. After doing a study of their recovery data, we determined that a large percentage of their restores was for data modified within three days of being requested for restore. After residing in the library for three days, the designated tapes are withdrawn from the library and placed in the safe for weekly pickup from their off-site vendor.
The savings are many. For one, the processes involved in readying the tapes for pickup on a daily basis have been significantly reduced. Instead of performing these processes every day, the university now performs a slight variance to these processes twice a week, while allowing the online access of the data that is most likely to be requested for recovery by any one department. This also lessened the probability that the university would have to pay the off-site tape vendor to retrieve a tape.
In addition to those savings, the university was able to rework the contract with their off-site vendor to decrease the number of pickups at their site. Last but certainly not least, is the benefit of being able to immediately start the recovery of applications should their primary data center be destroyed. No waiting for the off-site vendor to bring the tapes to the recovery location, and no waiting for hardware to be provisioned by their recovery vendor. They literally can have their backup servers up and ready to restore data within two and half hours (a test was performed three times with a median recovery time of 2 1/2 hours) of losing the production backup server.
By deploying an enterprise class tape library and putting it on a extended SAN, they've given themselves an extended amount of historical online data for quick recoveries, saved man hours and contract costs in the tape management processes and dramatically shortened their recovery objectives. Together, these benefits have allowed the university to increase their availability numbers to an acceptable level without purchasing like disk arrays for mirroring across long distances or signing a costly contract to have these services provided by a service provider.
Judging by market share, Sony's AIT-3 features have not caught the attention of many storage managers looking to ease the burdens of tape management. Or possibly, no matter how good a different technology has become, storage managers or their management aren't willing to change from their current format because it indirectly means that someone has made a bad decision in the past.
If you're an administrator tasked with implementing a tape library using a format that may not necessary match the needs of the application, appeal to your management to consider the benefits of a new tape format. It really could mean the difference between success and failure. But be prepared with a plan to show your management how you intend to preserve the recoverability of the data remaining on those archived tapes.
Connecting the library
There are various ways you can connect a tape library to your storage area network (SAN). For the most part, the connecting interface of your tape drives will determine the hardware requirements representing the data path between your backup servers and the library. For example, if the plan is to move forward with direct-attached SCSI tape drives, you'll need to bridge the SCSI traffic onto a Fibre Channel (FC) highway via a SCSI-FC bridge.
However, if your tape drives are outfitted with native FC connections, you get the benefit of accessing your tape drives directly, without any translation. Native connections also enhance the manageability of your SAN by reducing the number of devices in it. And depending on your library vendor, you may also experience the benefit of placing your library in a lights-out location without any IP security vulnerabilities. That's because some vendors are now offering the ability to control the library's control port via a native FC connection. Otherwise, you'll need to make the library's control port available via a SCSI-FC bridge, which often have management IP ports.
In my experience, FC-SCSI bridging has been problematic. Not only does it impose the additional management overhead of keeping firmware revisions current, it also adds computational overhead to the data path between the backup server and destination storage device, thereby possibly lowering performance and usability.
For example, during one implementation I participated in, the tape library was located more than the allowable distance from the backup server to use a SCSI cable. As a result, the engineer driving the project decided to bridge the gap by connecting the library's robotic arm to an FC-SCSI bridge, and then connected the bridge to the SAN to allow the backup server to address the robotic arm via the SAN (fiber optic cabling was used in the SAN). Often, the robotic arm was slow to respond to the SCSI commands of the backup server during the busiest of times, resulting in failed backups and the inability to restore data. To remedy this problem, we relocated the backup server within SCSI's distance limitations and directly connected the library's robotic arm to the backup server and the problem went away.
Yet the simplicity of the native FC tape drives can often lead to bad practices in the field as well. If a tape library solution called for 10 native FC tape drives, common practices in the field are to connect the 10 tape drives into 10 separate ports on one 16-port fabric switch. Considering the speed of the fastest tape drives available today, this is quite a bit of resources that will never be utilized per port. And if 2Gb/s switches are used, the problem and costs are compounded even further.
Remember that when connecting a native FC tape drive to a fabric, the tape drive usually shows up as a public loop device: the tape interface is talking FC-AL and the now initialized FL_Port on the switch is also talking FC-AL with a view into the nameserver's database in the switch. This allows fabric-attached hosts on the SAN to address and establish a path to the public loop tape drive.
In this configuration, a more economical and scalable approach would be to connect two 8-port arbitrated loop switches to two ports on the fabric switch (see "Connecting libraries to SANs: dos and don'ts,"). The five tape drives will share the bandwidth of the FL_Port, making it a more likely match of required resources and lowering the cost of the overall solution.
When designing a library solution, be creative but pragmatic. Once you know precisely what you need, there are many ways to exploit the various technologies involved to achieve your ends.