Data protection and specifically backup is one of the most important components of a comprehensive data management strategy. Traditionally, backup has been seen as an onerous job. However, the introduction and evolution of integrated backup appliances has eased that pain for many organizations.
Deploying an appliance is relatively simple when compared with building out a backup infrastructure. Integrated backup appliances use standard hardware and combine it with backup software to create an integrated product. The integration reduces or eliminates some of the major issues experienced in dealing with disk backup systems, namely coping with scale and capacity. Legacy backup design requires the management of each component, including the network, server and storage. As a disk backup system grows, so does complexity, and issues can go unnoticed until the backups start failing.
The appliance model provides easier scaling because you can simply deploy more nodes or units, easier license management (license per TB or per appliance) and the most recent products have started to take advantage of the data in the disk backup system for other uses. This includes improved disaster recovery capabilities and the ability to repurpose backup data for test and development work.
However, many organizations have already deployed and are satisfied with the performance of their "traditional" disk backup targets. These disk backup systems allow you to use whatever backup software you prefer as well as the server hardware of your choice.
Regardless of whether you use an integrated backup software appliance or a traditional disk backup target, data protection is evolving to the point where it can drive business efficiency and protect data assets.
Backup and virtualization
In a pre-virtualization world, backup agent software was deployed on physical servers that directed data from the server to the backup software and down onto disk or tape. Each server had abundant network connectivity and storage bandwidth to feed the backup infrastructure and so the throughput issues resided mostly with the backup hardware. Fast-forward to today and infrastructure looks very different, with the majority of server workloads now virtualized. Virtual backups don't work well with agent-based software (due to the oversubscription of virtual to physical resources), however hypervisors offer APIs for improved integration with backup products, such as VMware vSphere's VADP (vSphere APIs for Data Protection).
Choosing the right backup target
Looking across the market, we see a core set of attributes common to both integrated backup appliances and traditional backup targets.
- Space efficiency. Typically, this means data deduplication (dedupe), which reduces the physical data stored on an appliance by retaining only unique copies of data. Dedupe uses metadata to track the logical-to-physical relationship between the content backed up and the data stored. Vendors typically quote both physical and logical capacity for their offerings. Deduplication can be achieved inline or as a post-process task. Some vendor implementations provide client-side deduplication to reduce the amount of data transmitted across the network. It can be performed at the block or file level, but today, most dedupe is block-based.
- Scalability. Enterprise-capable products must scale to store large volumes of data. Vendors have taken both scale-out and scale-up approaches to this challenge. Scale-up products are rated on maximum physical capacity and the throughput capabilities, with multiple model types to choose from. Scale-out products use a cluster or node-based architecture to increase both capacity and performance by adding more nodes to a disk backup system.
- Performance. Effectively, this is a measure of the throughput of the system and needs to be viewed from both a backup and restore perspective. Some systems provide high backup throughput, but don't do as well at the most critical time, when a restore is required.
- Security. Although disk-based backup systems are not going to be physically moved around like tape, there is a need to protect data within the appliance to avoid data theft or corruption. Also, if you are sending data over a WAN to a system located offsite (in a colocation facility, for example), you will require encryption over the network.
- Resiliency. Backup data may be retained for a few days or many months and years. The resilience of an appliance therefore becomes a critical consideration and in particular this means how data is protected on disk. In some ways, resilience is more important for a disk backup system than production storage systems, because the backup appliance may hold hundreds or thousands of backup images on the same physical disk space (due to the savings from deduplication).
- Vaulting. Traditionally, vaulting meant the ability to move data off of the appliance to tertiary media. Traditionally, this meant moving data to tape, but we are increasingly seeing the ability to move data to and from public cloud storage, providing scalable capacity and off-site protection.
- Protocols. Many integrated backup appliances offer support for common protocols such as NFS and SMB.
- Form factor. Some products are now available in both physical and virtual form factors. Virtual offerings provide the ability to cost effectively support branch or smaller offices and consolidate backup data into a central location.
Backup targets today
A variety of vendors offer integrated backup appliances and traditional disk backup targets today. This is not meant to be a comprehensive list, but instead examples of the products available. As you will see, systems are available in a wide range of capacities and with different native capabilities, depending on your specific needs.
- Veritas offers the NetBackup 5230 and 5330 series. Capacities scale from 4 TB to 229 TB, depending on the model, and the systems can be deployed as either traditional master/media server configurations or self-contained appliances.
- Arcserve's UDP 7000 is available in 11 models, 5 of which are capable of being used to deploy "virtual standby" recoverable instances of virtual machines. The largest UDP systems are capable of protecting up to 90 TB of source data.
- Barracuda Networks offers nine versions of its Barracuda Backup appliance. Systems scale from the desktop model at 500 GB to 4U rackmount systems capable of supporting up to 112 TB. Data can be vaulted or replicated to public cloud storage and systems also support backup of data created with software as a service (SaaS) offerings like Microsoft Office 365.
- Unitrends offers a range of products supporting backup capacities up to 118 TB. Appliance models above entry-level also incorporate solid-state disks in order to improve performance of both backup and restore. Features include encryption, replication and instant recovery of virtual machines.
- StorServer offers a range of backup appliances, scaling up to 100 TB of data and are packaged with either IBM's Tivoli Storage Manager or Commvault Simpana 10.
- EMC leads the pack in terms of market share with EMC Avamar (integrated appliance) and EMC Data Domain (deduplicating backup target). Both platforms are comprehensive in their client support, including virtual server environments.
- IBM has a line of backup systems based on the company's ProtecTIER deduplication software and are packaged as either the IBM TS7650G ProtecTIER Deduplication Gateway (which can connect to external storage) or the TS7620 Appliance Express (with integrated storage). Both products provide support for virtual tape library, Symantec OST and SMB/NFS protocols.
- HP offers a range of backup targets under the StoreOnce brand name, scaling from the entry level HP StoreOnce 2700 (5.5 TB usable) to the 6500 at a quoted 1,728 TB of usable capacity. There is a virtual version of the appliance that can be used to replicate remote office data into a central physical StoreOnce deployment.
A number of the aforementioned products provide capabilities to extend backup data into public cloud storage. There are also other examples in the market, including Datto, which offers instant recovery of applications running in the company's private cloud for DR. Microsoft acquired and sells StorSimple as a gateway appliance for moving backups into its Azure Cloud Storage platform.
Scale-out data management
Finally, we should mention a range of scale-out products such as those from Cohesity, ExaGrid and Rubrik. These products are aimed at providing high levels of scalability for enterprise environments. Cohesity has developed a platform that allows the backup data to be used for other purposes, such as spinning up test/development instances of production systems. ExaGrid and Rubrik are focused on bringing "Google-like" search capabilities into their platform, making it easier to locate individual content.
The last three examples point to a growing trend in backup, making more use of the backup data of an organization, either through improved search for archiving or to seed other environments such as disaster recovery or test/dev. These products are sometimes called converged backup systems.
Avoid taxing your disk-based backup system
How to switch from tape to disk backup
Taking measure of tape vs. disk backup