By W. Curtis Preston
It seems like nearly every data backup vendor has some type of cloud-based backup offering today. Everybody is jumping on the cloud bandwagon,
and it seems like everybody has a view about what the cloud is or isn't. It's difficult to separate the hype from the truth.
In this primer on cloud data backup by backup expert W. Curtis Preston, learn about the differences in managed cloud services; cloud storage vs. cloud backup; what data backup software can leverage the cloud; and the pros and cons of cloud backup services.
Although the terms are often used interchangeably, there's a big difference between cloud storage and cloud backup. Cloud storage is storage as a service. To tap into cloud storage, you get an account with a service provider; they provide you with their API and you use some type of software that enables you to store data via that API. Voila! You have storage with unlimited capacity. You don't manage the storage where your data resides, and you don't even have to ask for additional capacity. All you have to do is pay the bill.
All cloud storage services charge a "storage fee," a monthly rate based on how many gigabytes of data are stored in your account. In addition, some cloud storage providers may charge a fee for each gigabyte that's downloaded or uploaded -- essentially a "bandwidth fee." With cloud storage, you still have to manage the application that's sending the data into the cloud.
To be considered a cloud backup service, a cloud service must provide all of the above plus the software to make the backups happen. A cloud backup service typically provides some type of client software that must be installed on all the systems to be backed up. Backups are then automatically scheduled to occur on a regular basis. The backup software generally uses techniques such as delta-level backups or full deduplication to minimize network traffic.
The provider's service-level agreement (and the price they charge) will determine what happens when things don't go as planned. At a minimum, the service may provide an on-screen pop-up notification or an email message to tell you that things are going well (or not). The service may also have the ability to automatically escalate the problem when failed backups aren't addressed.
Some companies may use a cloud backup service for all of their backups, while others may opt for a combination of traditional backup methods and cloud services. There are two very different ways to go about integrating traditional data backup software and the cloud. You can use a traditional backup system in parallel with a cloud backup system, or you can use backup software that has the ability to use a cloud storage system as its target.
If the main reason you're considering using cloud-based backup is the "hands-off" aspect, then this is the route to take. You can continue using traditional backup software to perform the bulk of your backups, then use cloud backup software to handle those parts where it would be most beneficial. The most common practice is to start by performing remote site and laptop backups using the cloud backup service. Many companies aren't yet performing backups of their laptops, and backing them up with traditional backup software is problematic, to say the least. Most companies back up their remote sites, but they often use less than desirable methods because their remote offices don't have dedicated IT staff. A cloud backup service can solve both the laptop and remote-office problems; all you have to do is write a check.
Using cloud storage as a target for a traditional backup software package is a bit more problematic, but it's not without its advantages. The same things that are true of cloud storage for traditional data are true of cloud storage for backups: no management, endless capacity, etc. As a "bonus" you automatically get off-site backups, which is still a hassle for many companies. There may be more challenges than advantages, however, when it comes to using cloud storage as the destination for traditional backups.
>> Editor's Tip: For more information about administering cloud storage and cloud backup, read our article on cloud data backup management.
The first challenge is that traditional backup sends and stores a lot of data. Traditional backup systems typically perform full backups once a week, and even backup apps that don't perform repeated fulls on file systems (e.g., IBM's Tivoli Storage Manager) perform full backups on applications. (Many companies even perform daily full backups of some key applications.) In addition, all traditional backup applications perform full-file incremental backups. That means if just a single byte has changed in a file, the modification time is changed or the archive bit is set so the entire file is included in that night's backups.
Both of these typical practices create a lot of data that's sent across the network and stored on the target device. If the target device was a cloud backup service, it would require significantly increased bandwidth and higher charges to store the data in the cloud. Remember that traditional backup systems are why data deduplication was developed. The backup applications create 20 GB on "tape" for every 1 GB on primary disk. So a 10 TB data center would need to pay for approximately 200 TB of cloud storage every month.
The rule about not relying on a single copy of your backups stored in the cloud still applies whether you're able to use deduplication or not.
In addition to the cloud storage vendor's fees for disk capacity used and the amount of data transferred, there are the costs associated with having sufficient bandwidth to get the data to the cloud storage vendor. If you consistently and regularly create a 10 TB full backup and want to send it to the service over the wire, using a cloud storage vendor isn't likely to be practical. But even if your backup needs aren't that extreme, the behavior of traditional backup will make the cloud part of your backup system cost quite a bit.
The second challenge ironically involves one of the key advantages of using a cloud backup service: having backup data stored off-site. Assuming you solve the problem of getting the data off-site in the first place, you then have the problem of all of your data being in a different location than your servers. Obviously, this can significantly hamper your ability to meet your recovery time objectives (RTOs). This means that any copy of your data that's stored in the cloud should be just that, a copy. More specifically, it shouldn't be the copy you rely on for routine data recoveries. Using cloud storage as the only copy of large amounts of data that need to be transferred across the Internet is simply a disaster waiting to happen.
This sounds like a problem for data deduplication to solve, right? Sort of. A lot of backup software packages can deduplicate the data before sending it over the Internet. That can certainly address the challenge of getting the backups onto cloud storage, but it doesn't address the challenge of getting the data back. So the rule about not relying on a single copy of your backups stored in the cloud still applies whether you're able to use deduplication or not.
There are now a number of companies with software and hardware products that support backing up to the cloud (see "Data backup software plus the cloud" below). The first backup application vendor to announce support was Zmanda Inc., a commercial firm that offers its version of Amanda, an open source backup program. Amanda Enterprise 3.1 is capable of backing up directly to Amazon's Simple Storage Service (S3) cloud storage service.
CommVault Systems Inc.'s Simpana supports backing up to any cloud vendor that supports the Representational State Transfer (REST) protocol. So you can use cloud storage services such as Amazon, Iron Mountain, Microsoft Azure, Nirvanix or Rackspace as a target for CommVault Simpana backups or archives. Archiving may actually be a more appropriate data protection application for cloud storage because archivers don't perform repeated fulls and they have object-level dedupe built in.
EMC Corp. and Symantec Corp. did something similar when they each added the capability to back up to their own networks. EMC NetWorker backs up to any cloud vendor using EMC Atmos-based storage, while Symantec Backup Exec backs up to the Symantec Protection Network.
If your company uses a backup application that doesn't yet support backing up to the cloud, you might want to consider Nasuni Corp.'s Filer, which provides an NFS/CIFS NAS gateway to cloud storage. Any decent backup software package can back up to an NFS or CIFS mount.
Although it has limitations, one must still consider the availability of data deduplication when exploring backup applications that integrate with the cloud. Neither EMC NetWorker nor Amanda has any deduplication built into its products. CommVault Simpana and Symantec Backup Exec can deduplicate data before it's sent to the backup target. Simpana offers target deduplication that does the deduping once the data is sent to the media agent, while Backup Exec uses source deduplication with deduplication occurring at the client before the data is sent across the network. This makes them much more attractive companions to cloud storage. IBM Tivoli Storage Manager (TSM) customers also have an interesting option with Nasuni because TSM has deduplication built in.
Some backup applications and other products have developed ways to integrate with cloud backup services. Here's a sampler of some currently available products.
|VENDOR||PRODUCT||WHAT IT DOES|
|CommVault Systems Inc.||Simpana||Can use any cloud storage service that supports REST as the target for backups and archiving|
|EMC Corp.||NetWorker||Can send backup data to any cloud storage service built on the EMC Atmos platform|
|Nasuni Corp.||Nasuni Cloud File Server||On-site storage appliance that stores data before sending it to the cloud; can be used as a backup target|
|Symantec Corp.||Backup Exec||Can send backup data to the Symantec Protection Network|
|Zmanda Inc.||Amanda||Can use Amazon's S3 service as the target for backups|
Cloud-based backup can be great complements to traditional backup systems, especially when those systems provide some level of integration. Because a cloud backup service will require little if any hardware to be installed on your site, it's relatively easy to perform a full proof of concept using real data. This is especially important because implementation may require substantial investments for licenses and have a profound effect on your backup environment. As with any backup product or service, you should test everything and believe nothing.
About this author: W. Curtis Preston is an executive editor in TechTarget's Storage Media Group and an independent backup expert. Curtis has worked extensively with data deduplication and other data reduction systems.
This article was previously published in Storage magazine.
This was first published in January 2011