Phil Goodwin, Contributor
Published: 04 May 2012
Cloud archiving services can offer accessibility and data preservation at a fraction of the cost of building an on-site archive infrastructure.
It wouldn't seem necessary to start a discussion about archiving by defining the term, but it is. In the early days of computing, archiving was understood to be the process of moving data on tape to a remote facility for long-term storage. Now, however, archiving has taken on numerous meanings based on context. Archiving can be the “auto-archive” simplicity of Microsoft Outlook, moving older data to cheaper storage as well as more traditional long-term off-line storage. In the context of cloud computing, we’ll define it to mean relegating data to a third-party location for the purposes of lowering costs, improving data protection or both while still maintaining a reasonable degree of data access.
How long is long?
Regardless of context, implicit in the notion of archive is time -- typically a long time. But “long” is a relative concept. For most financial data it means seven years, 20 years for pharmaceutical research, and more than 50 years for some medical records and nuclear records. In general, retaining data on spinning (or even spin-down) disk for 10 years or more is cost-prohibitive even in the cloud. So, for the purposes of this discussion, we’ll define “long” as between one year and seven years. For data retention exceeding seven years, disk systems will be the media of choice in only specialized applications. Some examples of those specialized apps include geospatial data (i.e., oil and gas exploration images), medical images and aircraft maintenance logs where the frequency of access is low but the probability of retrieval at some point is high; therefore, the time and difficulty of recovering 15-year-old tapes is likely to be unacceptable.
Price vs. performance
Cloud-based archive opens the possibility of a “just right” balance between cost and accessibility. Tape has been, and remains, far and away the lowest cost method of storing data for years. A typical LTO tape holding approximately 1 TB of data costs roughly $35 with monthly off-site storage in the range of 25 cents per month. There’s no way for even the cheapest cloud disk to compete with this price. On the downside, the normal retrieval time for a tape from archive is next-day delivery plus the time needed to mount and restore it. This means users will wait about a business day before being able to access the information requested.
Cloud storage, on the other hand, starts at approximately 10 cents/GB per month and up (depending on volumes). This adds up when contemplating hundreds of TBs, but it’s still often less than the cost to procure, deploy and manage arrays in a central data center. Whereas tape retrieval is measured in business days, data hosted on cloud storage can be accessed in seconds. For some apps, this may be the ideal tradeoff between price and performance.
Cloud advantages, disadvantages
Before going all-in on cloud archiving, however, IT needs to weigh the virtues of cloud with in-house archiving. Technologically, cloud providers can’t offer anything that can’t be implemented in-house. So a company may, for example, choose to implement a tiered storage infrastructure with tier 3 high-capacity SATA disk to achieve a lower average cost per GB stored. Generally, organizations will lean toward an in-house solution if they can’t risk the loss of connectivity to a remote location, have regulatory requirements that require strict data security oversight or have data retrieval requirements where remote latency would be unacceptable. This is a fairly restrictive list, but there are still many applications that are candidates for cloud archiving.
IT organizations can quantify the logistical effort to migrate to cloud, but shouldn’t overlook a predictable but unforeseen challenge: a mind shift from a technology-centric perspective to a service-level management perspective. IT staff used to making technology choices and deployments often want to delve into the cloud vendor’s architecture and “suggest” product or technology-specific implementations. Rarely are such requests warranted, as the vendor maintains full responsibility for managing the cloud infrastructure. IT departments really shouldn’t be concerned with the underlying technology, provided contractual service levels are met. With experience, staff attention will gradually shift from low-level details to higher-level governance.
Service is the critical factor
Service-level management, then, is critical to the initial decision for cloud archiving as well as ongoing operations. When shopping for a cloud archival vendor, consider the following service-level issues:
Uptime. For most applications, three nines or four nines of availability are sufficient to meet business requirements. If you need five nines, you probably have data access requirements that aren’t conducive to an archive tier. Data hosted in an archive tier is, by definition, non-critical. The uptime requirement largely determines how much infrastructure the vendor must provision, so it has a big impact on the hosting cost. Don’t guess; determine the actual hours when data will be accessed, access patterns and cost of downtime. These calculations can be compared to the cost of various uptime guarantees, and easily justified or rejected based on the comparison. Vendors often offer hosting-fee rebates or other performance penalties for missing service-level agreements (SLAs). However, the caveats are contained in the fine print, so read them.
Accessibility. Accessibility and uptime aren’t necessarily the same. The storage may be humming, but the subcomponents render an application unavailable. If you need redundancy or multiple redundancy of data links, for example, you’ll have to pay for them but the alternative may be unacceptable application outages. Make sure service levels encompass end-to-end data availability.
Performance. Quantify how many IOPS your applications require and ensure this number is part of the SLA. IOPS can be measured either as an average or during peak activity. If you demand IOPS guarantees at peak, then you’ll have to pay for the vendor to provision them. Some vendors may offer metered billing, but many organizations don’t like the potential uncertainty of such billing should demand suddenly spike. Most organizations will absorb a certain amount of constrained operation (especially for an archive tier) in return for cost certainty. In this case, the SLA is for guaranteed IOPS, not absolute performance experienced by the end user. If application demands exceed contracted IOPS capacity, it’s rightly the IT organization’s problem; additional IOPS can always be purchased.
Data recoverability. As they do for in-house applications, IT organizations need to specify recovery point objective (RPO) and recovery time objective (RTO) requirements for cloud-based archives. This is related to uptime, but also covers contingencies such as data corruption or a component failure that doesn’t affect overall uptime but impacts individual applications. The vendor should have default values for RPO and RTO, which may be sufficient for an archive tier. Again, don’t guess. Know what kind of data loss and application unavailability the business units can financially tolerate. In many cases, it’s much more than is intuitive.
Disaster recovery (DR). If the cloud archive is used as off-site replicated storage to satisfy data redundancy requirements, it may not be necessary to consider a DR strategy for this tier. But buyer beware: Most hosted storage doesn’t include any DR contingency. If the hosted data is “live” data provisioned as hybrid cloud storage, then a DR plan may be necessary. Hosting providers may regularly back up the data, but they generally don’t rotate the data off-site, and if they do, they do so infrequently (e.g., monthly). Although a disaster at a SAS-70 compliant data center is unlikely, it’s not impossible. DR capability from a hosting company is often a significant additional expense and can change the economics of hosting in a hurry. Make sure data isn’t left in a vulnerable state.
Backup and recovery. Even if the hosting vendor backs up the data regularly and rotates it off-site frequently, IT organizations may not be out of the woods. Hosting companies usually have a limited number of backup software options and tape technologies. This means their backup format (hardware, software or both) may be incompatible with your IT systems. If an IT organization is forced to do a recovery from the vendor’s tapes, there could be a substantial delay in acquiring the necessary infrastructure. Ensure there’s a way out in a worst-case scenario.
Compliance. Archived data that requires special compliance treatment may still be a candidate for cloud hosting. You’ll need to ensure the data is retained on immutable media, if required. You’ll probably also need assurance that strict access guidelines are followed and auditable; SAS-70 providers should have such processes in place.
Cost certainty and granularity. One of the key benefits to cloud storage hosting for archiving rather than using in-house infrastructure is that you pay only for the storage consumed. The metering should go up or down with use, though it may have a floor minimum.
Turn tapes into cloud archives
It’s clear that cloud archiving may be attractive to companies with aging data stored on relatively expensive in-house arrays. More questionable is whether or not converting from tape-based archives to cloud archives makes sense. Larger organizations may have tens of thousands of tapes in off-site archives. The process of retrieving all those tapes and reading them onto a cloud archive infrastructure is daunting. It also assumes the provider has the necessary hardware to read all the tapes, some of which may be in obsolete formats. Moreover, there’s no way a cloud provider could host such a data volume at anything close to the cost of tapes sitting in a glorified warehouse. Disk compression and data deduplication can help significantly, but the difference in cost is still likely to amount to a substantial premium.
Even though the hurdles for converting tape to cloud archiving are high, it may still be a consideration. Tapes more than seven years old are likely to be very expensive -- and possibly problematic -- to restore. Best practices dictate that organizations retrieve and rewrite tapes every five years to ensure the data is readable and the format is current. It’s a task to be reckoned with. For example, with a 10,000 tape archive and a five-year refresh cycle, a company would have to refresh 2,000 tapes each year. That comes to approximately eight tapes per workday, which is doable, but requires a year-around effort for what’s fundamentally a nonproduction exercise. Here again, the crux of the matter lies in the probability of retrieval. Some organizations choose to allow tapes to become obsolete in the vault with the knowledge that a recovery would be painful, but the probability of needing to restore the data is low enough to be worth the risk. On the other hand, if you know a recovery is all but inevitable, you may opt to incur the time and expense of moving from tape to cloud now, thus saving significant time and effort later, perhaps under urgent conditions.
That’s not to suggest that tape is losing its role in archiving. It’s still the lowest cost choice for most situations. In addition, LTO’s Linear Tape File System (LTFS) is enabling tape to take on a new role as “tier 4” storage, so it can act as another tier in the cloud (or data center) that’s provisioned along with tiers 0, 1, 2 and 3. In a cloud archive environment, this would effectively enable a hybrid cloud that offers relatively fast access (e.g., minutes) but at the price point of tape for rarely accessed data. The tapes will also have built-in compression, and the options of encryption and WORM. Using automated tiering software, data can be moved automatically to the archive tier.
The inevitable “what if”
So far, we’ve painted a fairly positive picture of cloud archiving services. Usually the effort yields the desired result, but not always. Organizations should consider what would happen if they transferred tens of TBs of data to a provider and then failed to realize the desired or contracted results. Sure, penalties might kick in, but small monetary penalties wouldn’t fully compensate for the true cost, aggravation or damage to the IT organization’s reputation for delivery. Contingencies begin with a contract that may be terminated without penalty for failure to meet specific performance levels. It should also include a plan for alternative hosting capabilities, either back in-house or with another provider. Cloud archiving is fairly low on the list of risky endeavors, but smart organizations will be prepared for anything.
BIO: Phil Goodwin is a storage consultant and freelance writer.