Published: 14 Jun 2004
|RPO and RTO|
The recovery of critical data in an environment is measured by two key parameters: recovery point objective (RPO) and recovery time objective (RTO). RPO measures the maximum acceptable age of data at the time of an outage. RTO is the maximum acceptable length of time to resume operation following an outage.
Continuous backup or more accurately, continuous data protection (CDP), changes the rules by using disk technology to continuously capture updates to data in real time or near real time. The result: Backup windows become irrelevant because backup is occurring all the time.
But the main benefit of this technology is not on the backup side of the equation. Rather than the single point-in-time access to data that traditional products provide, this technology promises any point-in-time continuous access. Reducing the time to recovery to near zero will be the real driver for adopting CDP products. As with any new technology, it is important to understand what it does, how it works and where it fits within an overall data management and protection strategy.
Approaching instant restore
The recovery of critical data in an environment is measured by two key parameters: recovery point objective (RPO) and recovery time objective (RTO). These concepts are critical to an understanding of CDP.
- RPO measures the maximum acceptable age of data at the time of an outage. A backup performed nightly would represent an RPO of approximately 24 hours because the worst-case scenario would be an outage during the backup.
- RTO is the maximum acceptable length of time to resume operations following an outage. This includes the time to restore from the backup media plus all additional time required for data integrity validation, system or application preparation, rolling forward data and so on (see "RPO and RTO"). So for example, a database outage that requires four hours to restore plus two hours of analysis and validation, plus one hour to roll-forward logs results in a seven-hour recovery time. If your RTO is four hours, then you've missed your objective by three hours.
Tolerance for such an outage varies substantially, depending on the nature of the business and the criticality of the data within that business. For most data, the traditional backup approach may be sufficient, but there are key applications and critical data that warrant special consideration. Much time and effort has been invested to ensure that split mirroring techniques provide this additional degree of protection.
Continuous Data Protection (CDP) lets you recover a volume almost instantly, dramatically reducing both RPO and RTO.
What happens if a shorter RPO is required? Our four-hour scenario requires a two- to six-fold increase in storage, depending on the number of fall-back split mirror copies desired. A two-hour scenario could potentially require up to a 12-fold increase (assuming a full 24-hour history is maintained), and may not even be achievable, given the required recovery activities and management complexity involved. This is where CDP can help.
CDP to the rescue
CDP adds the dimension of time to storage mirroring. Conceptually, a CDP solution comprises a base-level image or mirror of an existing volume augmented by a time-stamped log of each subsequent write operation. To be technically accurate, in most solutions the mirror is updated with each write transaction and the log maintains a history of the data blocks that have been modified, similar to the copy-on-write technique used during a snapshot. In any case, most CDP products have the ability to present a virtualized volume representing the contents as they existed--I/O by I/O--at any point in time from the beginning of the log.
The effect is to provide a much finer degree of granularity than traditional split mirror or snapshot technology. It's possible to view a volume as it existed minutes, or even seconds, earlier and to recover that volume almost instantly, dramatically reducing both RPO and RTO (see "CDP recovery"). To make this even more compelling, the disk requirement for this approach is also far less than traditional split mirroring. Instead of the six times the capacity needed for our four-hour split mirror with the ability to rollback 24 hours, a continuous backup solution requires a base mirror, plus enough space to maintain the log, typically resulting in a less than two times the requirement within a 24-hour period. Additionally, adopters of this technology may choose to retain the log information to allow rollback for greater periods of time, up to several days or possibly even weeks, depending on the rate of change.
|Revivio CPS appliance logical architecture|
The Revivio Continuous Protection System (CPS), a SAN-based appliance, creates an additional set of mirrored volumes that can be restored back to any point in time.
CDP products target different company needs. Some products are software-based, while others are appliances; solutions may operate at the application, file system or volume level; some are agentless, while others require host-based agents and/or drivers. The degree of recovery granularity can vary significantly among the products as well (see "CDP products"). Here are some examples of companies with CDP products:
Revivio Inc. The Revivio Continuous Protection System (CPS) is a storage area network (SAN)-based appliance designed to operate in parallel with an application's primary data storage. Designed to meet enterprise-class data requirements, it incorporates high levels of fault-tolerance, including redundant, hot-swappable components and fault tolerant cache. It also doesn't require software applications, agents or drivers to be installed on host servers. Instead, the Revivio appliance presents storage LUNs to host servers and leverages the (presumably) already existing host-based volume management software--such as Veritas Volume Manager--to operate as an additional set of mirrored volumes for an application (see "Revivio CPS appliance logical architecture").
Optimized to support a high I/O transaction rate, the Revivio CPS appears to the host as an additional write-only mirrored volume. However, as data is subsequently written to the CPS, the data that is replaced is maintained in a time-stamped log. The appliance then provides the ability to present to any host a virtual view, or TimeImage, of any set of protected volumes exactly as they appeared at a specific point in time. The disk storage that actually holds this data can be any SAN-based storage available in the environment. Depending on desired data protection policies, it can range from mirrored EMC Symmetrix volumes to low-cost JBOD.
In the event of data loss or corruption, recovery consists of "dialing back" in time to identify the point in time immediately prior to the loss, verifying the validity of the data and presenting this TimeImage volume back to the host.
The result is a solution that provides a level of protection similar to split mirror technology, but typically requires less storage while providing far more granularity and faster recovery times. Revivio is best suited for enterprise-class applications running on databases that have significant RPO and RTO restrictions.
FilesX Inc. FilesX Xpress Restore and Xchange Restore are two software products that address a different portion of the market. Technically, these could be considered next-generation snapshot products, rather than continuous backup, but they share many CDP-type characteristics.
Built upon snapshot technology, FilesX doesn't capture every change that takes place, but instead performs regularly scheduled snapshots sending changed data to a central repository. The granularity of these snapshots can range from a few minutes to hours, according to Jacob Herbst, CEO of FilesX, with the sweet spot being one to two hours. While FilesX is not a true continuous backup solution when compared to the other products, it offers a significantly improved level of granularity for Microsoft Exchange and Windows file server environments than is achieved by traditional backup means.
The real value is to be found in the product's recovery capability. FilesX captures block-level changes, transfers them to a repository and then can map those changes to a file-level view. This provides multiple generations of virtual images similar to a CDP application. Imagine being able to open an Explorer-like window showing one or more generations of Exchange Information Stores, and being able to quickly navigate through mailboxes and messages, select those to be recovered and drag and drop them to the current active information store. Or imagine an e-mail virus attack, where you are able to view the history of snapshots, locate the most recent prior to the attack and then right-click and restore the information store.
FilesX offers the same recovery capabilities for file system data as well. In addition, it offers an interesting capability for system-level recovery. A significant problem in disaster recovery scenarios is bare metal restore. Maintaining system images that are current with security patches, drivers, application changes, etc., is an enormous effort. With FilesX, system disks can be regularly snapshot along with other file systems. To recover, boot from a CD, connect to the FilesX repository and re-paint your system disk as it appeared at the point of the last snapshot.
XOsoft Inc. XOsoft's Data Rewinder is another example of a software-based continuous backup product. At the core of Data Rewinder is a filtering file system layer, known as XOFS, that sits directly above the standard OS file system in the data stack. All file system writes pass through XOFS. As data passes through to the file system and to primary storage, Data Rewinder simultaneously creates a "counter event," which is essentially an "undo" command that's sequentially written to a journal, presumably located on separate storage volumes.
If data corruption occurs, an administrator can view the journal and select a point in time to "rewind" the data. This is accomplished by sequentially executing the undo operations to return the data to its exact state at that point in time. The time to perform these operations will vary, depending on how far back in time you need to go, but it will be much quicker than restoring an entire image from a backup.
Unlike the other products discussed here, Data Rewinder resides on the host system it's protecting. This is because much of the functionality in Data Rewinder is derived from XOsoft's data replication product, WANsync, and the products are intended to complement one another. One of the key features of the product is support for specific applications, including Exchange, SQL Server and Oracle. Also, Data Rewinder restores the original volume to a specific point in time, so the product can't be used to restore individual files or e-mail messages.
Mendocino Software. Mendocino is one of the newest and one of the oldest entrants into the continuous data protection arena. Mendocino acquired the technology assets of Vyant, including their CDP software product RealTime, which has been shipping since early last year. RealTime is architected for an application host server to be paired with a dedicated backup server, making it most appropriate for environments where a fixed number of selected servers require CDP-class protection.
On the host side, an Intercept Agent continuously captures write operations on a server to a serial buffer, and then transmits them asynchronously via TCP/IP to the backup server. On the backup server, a backup agent maintains a "nearly current" replica of the primary storage while an archive agent maintains forward and reverse incremental journals that enable quick movement forward and back through time to view and access data. This is accomplished via a "time slider" interface that can quickly present virtual volumes at any point in time based on the capacity of the backup server.
If data loss or corruption occurs, an administrator can slide to any desired point in time and mount a virtual image. A copy of the production application would be run on the backup server, and testing would determine if the corruption was still present. This can be repeated until the optimal point is determined, then the image can then be committed to primary storage via a restore agent. Virtual images can also be used for other purposes such as nightly backups and problem analysis.
TimeSpring Software Corp. TimeSpring Protector is a software-based CDP product that will be introduced later this year. Focused on Windows environments, Protector operates at the file level.
According to Sunil Bagai, VP of product management for TimeSpring, file-level data capture provides some benefits not available from block-level protection products, including the ability to better select which data to protect and more flexibility in assigning policies to this data. The initial product focus will be on providing application protection for SQL Server as well as for general file system data.
TimeSpring employs a client-server architecture with an agent installed on each protected server that--similar to XOsoft--sits above the file system and selectively captures write operations, based on desired protection policies. The captured data is then transferred to a Continuous Protection Server where the data is stored in a journal while related metadata is stored within a relational database. Data recovery is performed in one of two ways: First, a read-only virtual shared volume can be presented to a server to mount and access data; alternatively, a recovery wizard can be launched through the management console to perform a guided recovery. Recovery is done at the file level. To further protect data, TimeSpring will also be offering the ability to perform asynchronous replication of the Continuous Protection Server across the WAN.
Alacritus Software. Another CDP product on the horizon is Chronospan from Alacritus. The initial Chronospan offering is an appliance-based product. However, Alacritus recently demonstrated the product running on intelligent switch platforms from Cisco and Brocade, offering an alternative for deploying the product. In some ways, Alacritus breaks the mold of the CDP market. Rather than focusing solely on short-term data protection and recovery, Chronospan takes a holistic approach to data protection, addressing both short- and long-term requirements and completely eliminating the need for traditional backup.
As a non-host-based solution, Chronospan depends on a host-based volume manager to provide an additional mirror or possibly a switch-level mirroring facility to provide block-level data to Chronospan. Although data is being captured at the block level, Alacritus says that Chronospan is content-aware and can intelligently map and present file system contents for easy file-level recovery through an Explorer-like interface. This enables instant retrieval of data at any point in time. Additionally, for longer term policy management of data, Chronospan can prune its retention granularity as data ages. For example, a policy could be established to keep every block of data for one week, maintain hourly granularity from one week to one month, then daily for the next six months, and so on. Also included in the product is an image-cloning capability for data protection, and built-in replication is in the works.
Chronospan differs architecturally from most other CDP products because it doesn't maintain a physical mirror image of data and doesn't utilize copy-on-write block replacement. Instead, each block is streamed to a serial write log, and a map of the current view of the data is always maintained. For point-in-time images, the map is manipulated to point to appropriate data blocks in the log. Alacritus claims this approach ensures that the appliance will not become a bottleneck to production applications.
|Where to deploy CDP today|
Where does continuous backup fit in an overall storage strategy? Consider this scenario:
A data center with a centralized backup application, using some combination of tape and disk, provides the foundation for near-term data protection. This is supplemented by several other solutions. Microsoft Exchange e-mail is protected by an Exchange-specific CDP application that provides the ability to recover messages, mailboxes and information stores. It also provides point-in-time images of the information store for nightly backup. Key production database applications might be protected by a CDP appliance to provide instant recovery and also virtual images for backup and other purposes. File and print servers may be protected via snapshot software to provide appropriate levels of recoverability for that data.
Is CDP a good fit?
The thought of eliminating backup windows and providing nearly instant restores sounds great, but where does CDP fit within an overall strategy? Of course, not every product will fulfill all data production needs, and some storage environments may need to deploy a combination of products to achieve the preferred level of protection. (See "Major categories of data protection.")
To determine if CDP is an appropriate tool for your storage environment, some basic questions must be answered:
- What problems can it solve today?
- What tools or technology does it enhance or replace?
- What is the value proposition?
Also, most CDP products by themselves don't address disaster recovery, meaning they do not provide protection in the event of the loss of a site. To address this need, CDP can be supplemented either by replication or by traditional backup with off-site tape production. Many CDP vendors plan to add replication functionality to their products, but it's not available today.
At this time, the area that CDP best addresses is the backup and recovery of data associated with a particular set of applications. The characteristics of these applications are:
- They usually run continuously
- Their data changes frequently
- Their data is stored in large containers making activities like backup difficult and time-consuming
- There is a significant impact to the organization when they go down
|Major categories of data protection|
CDP provides many of the same benefits as split mirroring and snapshot technologies. Can it replace these solutions? In many cases the answer is yes, particularly if the requirement exists for improved RPO over what those technologies can practically provide. When evaluating a CDP solution vs. a disk mirroring or snapshot product, it's necessary to consider which features of each product are most important for your needs (see "Where to deploy CDP today"). Split-mirror and snapshot technologies--while often requiring some level of application integration--are for the most part application-agnostic, while a number of the CDP products are application specific. Also, technologies such as storage-based mirroring can be used by multiple servers accessing the storage system. Many of the CDP products are designed around protecting an individual host or application.
Ultimately, the question of value will determine if a CDP tool makes sense. The cost and benefits of the CDP products and their storage requirements must be evaluated against alternatives. Potentially, the most compelling cost comparison for CDP is against storage-based split-mirror solutions. In our original example of an environment that requires a four-hour RPO with the ability to retain 24 hours online, the disk requirement for a traditional split-mirror technology would be six times primary disk or greater, depending on the level of data protection required for those disks. The CDP disk requirement would be approximately one and a half to two times the primary storage, depending on rate of change (number of write transactions) and level of protection. While this is a substantial difference, it should be further noted that most split-mirror solutions require the use of the same high-cost storage as the primary volumes. CDP can, if desired, take advantage of lower-cost secondary disk storage.
Continuous data protection will continue to evolve to provide stronger integration with data intensive applications. As replication technologies are adopted or built into products, CDP solutions should be able to provide robust and cost-effective disaster recovery scenarios, as well. Who knows? Maybe someday we'll look back at the idea of nightly backups, scratch our heads and wonder why anyone would ever have done that.