Backup was the first broad use case for cloud storage. What started as an ideal way to back up consumer desktops spread to enterprise endpoint protection and now many cloud backup providers offer protection of physical and virtual servers. Cloud backup is a very crowded market with vendors bringing software and services to market almost every week. There are an almost overwhelming number of options available to IT professionals establishing a cloud-based backup strategy.
Organizations use the cloud for data protection in a variety of ways. Some opt for native cloud providers, in which cloud is the central point of the data protection process. Other companies, especially legacy vendors, have treated the cloud as an add-on to the data protection process. For them, the cloud becomes another backup target.
There is also differentiation among service providers. Some providers are really software developers that use an existing, generic cloud service like Amazon, Azure or Google for cloud storage. Others have created their own cloud-based data centers, purpose-built to store customer data. There are also managed service providers or regional cloud providers that use a third-party software product to provide data protection to their customers.
The advantage of a generic service is that the cost of entry for the provider is very low and that may be reflected in the cost of their service. But they have limited control over the environment, so troubleshooting a technical support problem may be as much a problem for them as it is for the customer.
The advantage of a purpose-built cloud is that the provider owns it all, software and hardware. While the cost to entry is higher for these vendors, the ability for them to scale specifically to their customers' needs, as well as solve any support challenges, should be better.
The third type of provider is really a facilities specialist that has added cloud-based backup as part of their feature set. They tend to use off-the-shelf software products that have been broadly adopted in the market. This type of product is ideal for a customer that wants to continue using their current backup software and wants to move data off-site without the expense of creating a disaster recovery site.
In this model, backup data resides on-premises as well as in the cloud. Typically, an appliance receives backups from the data center's servers and then replicates that data to the cloud provider's facility. The on-site appliance can be used for rapid recovery of servers and the cloud copy can be used in case of a disaster. Also, many legacy backup products can back up data locally and to the cloud. Or, data can be backed up on-site and then replicated to the cloud.
Native cloud backup providers have raised the data protection bar significantly over the last few years. Most started by overcoming the somewhat obvious challenge that cloud-based backup presents -- the latency of an internet connection. The initial products focused on moving data more efficiently across that connection, using technologies like block-level incremental backup, deduplication, compression and WAN optimization. They have also significantly closed the gap in terms of platform coverage, moving from the endpoint protection, to full server protection of Linux, Windows and VMware.
Many more traditional backup software and hardware vendors have also added support for cloud storage. However, most of the time, these vendors view the cloud as a secondary storage repository; it often does not minimize on-site capacity or bring any additional capabilities.
One of the major challenges that cloud providers and users faced was time to recover. While block-level incremental, deduplication and compression all improved the backup process, they did little to help the recovery process. In the event of a storage system failure or site loss, all the data needed to be restored across a relatively low bandwidth connection. And, of course, a full system recovery is often time-critical, because application downtime is unacceptable for many businesses. These shortcomings led to one of the most significant advances in cloud backup: disaster recovery as a service.
Disaster recovery as a service
Disaster recovery as a service (DRaaS) allows users to run an instance of a virtual machine (VM) in the cloud backup provider's data center. Assuming that all the networking issues can be resolved, this means that users can be back online in a matter of minutes.
DRaaS does require some new considerations as well as some potential changes in an organization's data center. The first thing to understand is what performance will be like during the DR event. Given the length of the outage and time it will take to restore all the data, the application may be running in the cloud provider's data center for some time. It is important to consider whether the cloud provider will meet an SLA based on performance during that time.
Also, the cloud provider will need to demonstrate the ability to host many organizations' applications simultaneously in their data center. If a regional disaster such as a hurricane occurs, cloud providers can be overwhelmed with recovery requests. An SLA provides a way to make sure that performance is acceptable for long-term use.
You must also understand how failback will work. In the event that the organization's entire data center is destroyed, all this data needs to be recovered. Do they provide a bulk transfer capability like shipping hard drives or tape media to the organization to facilitate a faster local recovery? If the entire data center is not destroyed, a much more common outcome in a disaster, do they have a way to intelligently recover data so that only the data that changed while the application was running in the cloud will be restored?
DRaaS may also require infrastructure changes. First, most DRaaS offerings leverage virtualization to deliver the functionality. This means that physical servers may need to be virtualized or the provider may need to have physical-to-virtual conversion capability on their end.
Finally, for DRaaS to work, networking must be properly re-routed to provide a seamless transfer between the organization's data center and the cloud provider. Fortunately, most native cloud backup providers are more than willing to assist in making sure the various networking configurations are set up properly.
Another new cloud backup strategy is the replacement of the cloud backup appliance with a cloud backup accelerator. Many appliances execute a 1:1 translation to the cloud, meaning that 100% of the data being backed up needs to be stored locally and in the cloud. To accommodate this, the appliance needs to have capacity added to it continually. There is no practical reason why these backups are not deleted or moved after a period of time, just that the software has not be updated or designed to support that capability. While some providers do have the ability to tier older copies of data for cloud storage only, most require at least 100% of the latest full backup to be stored on-site.
An accelerator changes this by utilizing more of a caching methodology. The local accelerator does not need to store all of the data, or even the most recent backup. Instead, data is removed from the appliance as soon as its arrival in the cloud is confirmed. These accelerators are generally controlled by policy so that certain mission-critical applications can be set to always have the most recent copy of their data reside both on-site and in the cloud.
This is ideal for companies with a few large databases and a lot of file data. Since most file data is not mission-critical and is recovered a file at a time, the latency of the cloud is not as much an issue for their recovery. A large database needs to be restored all at once, as quickly as possible, so having this copy backed up locally is often a requirement. The ability to control this by policy allows the on-site accelerator to be one-third the size of an on-site appliance, thereby reducing costs and simplifying the on-site operational requirement.
Another feature that cloud backup vendors are beginning to provide is cloud-to-cloud backup. By granting the backup provider access, they are given a direct connection to the cloud service. This allows them to directly back up software as a service (SaaS) providers such as Box.com, Dropbox, Google Drive, Office 365 and Salesforce.
Most users assume that because data is in the cloud that it is also protected. To some extent, this is true. Cloud-based data is backed up, but only to protect the cloud provider, not the organization. If, for example, an employee deletes files, there is little that can be done to recover them without paying restore fees. Cloud-to-cloud backup protects against this type of data loss.
The downside of cloud-based backup
While there is a lot to like about cloud backup, there are some downsides that need to be taken into consideration. The first is of course security, a common concern about any cloud service. It is important that the cloud backup product provides complete end-to-end encryption, where data is not only encrypted in transit to the provider, but also while at rest in the provider's facility. Organizations with highly sensitive data should also only consider products that allow them to hold the encryption keys, instead of the provider.
Another area of concern is flexibility. For example, some organizations may want to perform recovery in a secondary facility of their choice instead of being forced to recover to the provider's location. Also, cloud providers that supply appliances will force the organization to use the included disks, instead of potentially using what the organization already has in place.
Finally, and maybe most surprising, is the potential cost of the cloud product when considered over time. Cloud-based backup products capture the attention of organizations partly because of their low upfront costs. But when these monthly or quarterly costs are multiplied out over five years, the life expectancy for an on-site backup product, cloud backup products can actually be more expensive than traditional, on-site backup. This is especially true as backup capacities grow, since most providers charge by the capacity under protection.
Reverse incremental backup
Reverse incremental backup is another feature that reduces the time it takes for a server instance to be recovered. With traditional, block-level incremental backups, an initial full backup is completed as normal. Then, when the next backup is taken, just the changed blocks are copied to backup storage and stored in a separate file, which is linked to the original full backup. If there is a failure, the original file and each of these linked files need to be recovered in reverse order, which is a time-consuming process.
Reverse incremental merges incremental backups into the full backup as each incremental is completed. But each incremental is also maintained as a separate file. This allows the full image to be kept up to date while still maintaining a rich version history. In the event that the latest server image needs to be recovered in the shortest time possible, all that has to be restored is this one file. While the size of that VM will impact recovery time, in most cases, a single VM with no additional incrementals to be merged will restore relatively quickly over a business class internet connection. But reverse incremental backup still provides access to various versions of individual files can still be restored from the incremental copies.
Cloud backup has evolved over the past few years to offer a variety of important functions. There are now several products that offer similar platform coverage and features to traditional enterprise products. Add to that new capabilities like DRaaS, cloud-to-cloud backup, and reverse incremental backup, and these offerings become very compelling. While not perfect (See: "The downside of cloud-based backup") a cloud-based backup strategy should be considered any time a data protection refresh project is underway.
What does -- and doesn't -- work with cloud data backup
Why cloud-based data backups are practical
Cloud backup options: Pure cloud vs. hybrid cloud