In a virtual server environment, servers that once relied on direct attached storage (DAS) are likely to reside in a virtual machine located on a storage area network (SAN) or on a network attached storage (NAS) filer. When planning a virtual server environment, you cannot afford oversights related to the storage architecture; they can have disruptive and even disastrous consequences. The critical aspects of backup in virtual environments are storage availability, configuration and management. But if you can overcome these storage challenges, your storage infrastructure may be more efficient than before.
Organizations planning and managing backups in virtual environments need to address three storage considerations:
- Emphasis on fibre channel (FC) or iSCSI SAN-based storage
- Storage I/O bottlenecks
- Storage virtualization integration
Most large enterprises rely on FC SANs for storage of virtual machines (VMs). Some organizations also rely on iSCSI storage or NAS as a complement to, or in place of, FC SANs. Most integrators (myself included) view fully redundant shared storage as a requirement in virtual server deployments, since placing several VMs on a local DAS disk would result in significant downtime if the VMs' physical host failed. Storage administrators can prevent a physical host from
Storage I/O a critical concern
With shared storage, storage I/O should be of paramount concern. For example, storing VMs on an iSCSI or NAS array with 1 Gb/sec network connectivity can degrade backup performance significantly. If you break 1 Gb/sec down into bytes per second, you're looking at a maximum throughput of 125 MB/sec. Realistically, you'll see around 85% of that, so the expected maximum throughput of a single 1 Gb Ethernet link would be around 106 MB/sec. Considering that a SATA 1.0 drive offers throughput of 150 MB/sec, your Ethernet-based storage could potentially be slower than first-generation SATA.
Granted, you can increase throughput by teaming NICs, as many companies do, or by deploying a 10GbE-based storage solution. (Few choices exist today, but several are coming in 2008). So far I've been describing storage I/O to a physical server. Available I/O will be divided among all VMs on that server. Assuming that six VMs share a team of 1GbE NICs, they'll have to share 2Gb/sec (250 MB/sec) throughput.
Assuming 15% overhead, the effective combined throughput would be 212.5 MB/sec. If you divide 212.5 MB/sec by six (the number of VMs sharing the storage I/O channel), the effective storage throughput per VM drops to 35.42 MB/sec. Again, these numbers assume that all six VMs require simultaneous and consistent access to the shared storage I/O channel, as would be the case if all VMs were backed up at the same throughput via a locally installed backup agent.
Obviously, the throughput numbers would be much higher with 4 Gb fibre channel storage, but you still must consider the consequences of shared I/O on backup. For servers that were simultaneously backed up prior to being virtualized, their backup jobs may have to be staggered after they are virtualized.
Working around potential bottlenecks that result from VMs accessing shared I/O channels requires organizations to restructure how they approach backup. For example, servers that were typically backed up directly to tape may not be able to meet their backup window should they continue to back up to tape in a virtual environment.
In this case, a disk-to-disk-to-tape (D2D2T) backup approach may be preferable to meet backup window demands and long-term archiving requirements. Since D2D2T backups require additional storage arrays to store backup files (usually one or two backup cycles worth of data), further storage planning will be required as part of any virtualization migration project. In addition, new backup approaches such as data deduplication may also drive the need to increase online disk storage on the SAN.
On new server virtualization projects, storage virtualization is often a concern. Adding virtualization intelligence to storage resources makes sense for virtualization deployments, inasmuch as today's storage virtualization technologies can break physical storage dependencies between VMs and their storage resources. Removing physical storage dependencies allows storage administrators to perform storage maintenance (such as decommissioning an older array) transparent to the VMs that use the storage. Storage virtualization appliances provide additional data protection flexibility through their ability to perform synchronous or asynchronous replication of LUNs on the shared storage arrays.
One challenge in writing about storage and backup in virtual environments is that every organization's requirements are unique. No one-size-fits-all storage solution exists. Instead, I've tried to highlight some of the common storage pitfalls and trends that impact backup in x86 server virtualization environments today. Expecting potential storage I/O bottlenecks and architecting around them is a critical first step in managing storage in support of virtualized systems. You should also expect to see new storage requirements in order to support disk-to-disk (D2D) or D2D2T backups.
Storage virtualization has been around for a number of years, but today the use case for it is stronger than ever. Storage virtualization will not only give you more backup options and flexibility, but will also increase the availability of all storage resources in your environment.
Note: This article focuses on VM backup considerations from a storage perspective. The author has also written about general architectural considerations that impact VM backup and recovery.
About the author: Chris Wolf is a senior analyst for Burton Group and author of several IT books, including Virtualization: From the Desktop to the Entreprise.