Published: 03 Jun 2002
In the world of global business, the demand for information never sleeps. With continuous availability requirements, snapshot techniques that provide an instant data copy help speed up routine maintenance procedures to backup, archive and protect data. Snapshots are also useful in development environments, for content distribution and other information repurposing needs, saving time and resources that are already stretched too thin.
A snapshot is an image or copy of a defined collection of data created instantly at a point-in-time. Copies are made almost immediately within the disk subsystem, despite the size of the volume.
A primary use for a snapshot is to facilitate non-disruptive backups. Essentially, the snapshot image becomes the source of the backup. After quiescing the application, the copy only takes a moment to create, so the user shouldn't notice any delay.
Traditional backups require the application to be shut down during the backup routine. This process typically occurs at night or off-hours. As more data has to be copied to tape, the race to sunrise is performed each night by the operations staff. Adjustments to systems and processes periodically must be made to meet the morning production deadline.
Since the snapshot provides a near-line, or additional disk-based copy of the data, the snapshot can be used as a source for restoring information. The most common reason to restore information is user error. For example, a user may inadvertently delete a file or make changes that need to be reversed. The ability to have another copy of the data readily available on disk provides a quick and easy way to locate and reinstate selected files.
Snapshot images also provide a convenient source for testing and training environments and for data mining purposes. Traditional methods of duplicating large amounts of data can be expensive and time-consuming, thus, the efficiency of snapshot is becoming increasingly valuable.
Know what you're getting
"Not all snapshots are the same - in fact, they vary from product to product," says Dennis Martin, storage management software analyst for the Evaluator Group. "The best advice is to make sure you know what you're trying to accomplish, and then investigate how each product meets your objectives so you can select the best fit."
Implementations of snapshot vary from vendor to vendor. Some implementations allow the snapshot image to be written or updated, although some may be tightly integrated with the backup software. Additionally, some techniques require less disk space for the copy. The two primary techniques are copy-on-write and split-mirror.
When a copy of data is requested using the copy-on-write technique, the disk subsystem simply sets up a second pointer - a snapshot index - and represents it as a new copy. Just as a Windows shortcut to an application appears to be a complete copy of the application, the snapshot volume appears to be a full copy of the data. To the user, it's the same.
Here's how it works: A snapshot is a logical copy of the data that gets created by saving the original data to a snapshot index whenever data in the base volume is updated. Essentially, the snapshot process creates an empty snapshot index, holding the original values that later change in the base volume after the time of snapshot creation. The snapshot only takes as long as needed to build a snapshot index - again - a nearly instantaneous creation. It's recommended that the base volume be quiesced during the snapshot, so a stable image of the moment in time is available.
The snapshot is actually seen by combining the base volume data with the snapshot index containing original data changed in the base volume. Thus, the snapshot gives an accurate image of an exact copy of the data at the moment the snapshot was taken. This copy-on-write technology enables the instantaneous nature of the snapshot, while only requiring a fraction of the base volume disk space (see "Taking a copy-on-write snapshot").
In addition to the convenience of the instantaneous nature, copy-on-write technology provides efficiency by requiring only a fraction of the base volume disk space. The average disk space requirements for a snapshot copy are 10% to 20% of the base volume space. The actual space depends on how long the snapshot is active and how many writes are made to the base volume (i.e., snapshot index). Except in a heavy write environment or when the copy is required to be active for a long time, copy-on-write is efficient.
A copy-on-write snapshot is effective as a backup source image. Since the disk space requirements are less than a full volume copy, periodic snapshots can be made throughout the day as copy points to reference in the event a restore is needed - for example, to restore a file that was inadvertently corrupted or deleted. If only one hour of lost productivity can be tolerated, a snapshot could be taken every hour and copied at night to tape for archival or disaster recovery purposes, using the snapshot as the source of the backup.
Managing multiple snapshots
Checkpointing is a tool to manage multiple snapshot images in aggregate. This is beneficial because only one index update is required for the group. This is similar to notifying a family doctor of an address change and having all respective family members get their file updated. Using the checkpointing analogy, only one address change would need to be entered, and all associated family members would reflect the change. Sophisticated roll-back capabilities are being developed to instantly roll back a complete volume. Currently, a snapshot is often reset on a routine basis or disabled upon completion of the backup to minimize the capacity and performance impact. However, with sophisticated roll back and efficient checkpointing, soon users will be able to select the time of the restore point and instantly revert to that moment.
Note that with copy-on-write, two writes now occur when production data gets updated. First, the original value must be saved in the snapshot index, and then the change is made to the production base volume. There are variations of the copy-on-write that don't require the original value to be saved in a different location, however, compacting the disk will be required with these variations. Such variations include the redirect-on-write and log-structured file methods.
Copy-on-write can be implemented with server-based software in the storage subsystem, and more recently in intelligent switch or virtualization devices. Each has pros and cons associated with it. (see "Copy-on-write snapshots: pros and cons")
Disk mirroring has long been used to maintain two or more up-to-date full copies of the data. Every write request to the original data is automatically duplicated to other mirrors or copies of that data. The mirror may be contained in the same subsystem or be between different subsystems, although these typically must be of the same subsystem model.
The primary purpose for a mirror is disaster recovery. In the event the entire subsystem should fail, mirrors must be written between two subsystems and have the appropriate distance for the disaster to not affect both systems. Often two subsystems will be mirrored and sit in the same data center. This will guard against hardware failure, but not from a site catastrophe such as a tornado. The further the distance, the more delay in performance; thus, asynchronous modes of data transfer are available to accommodate wide-area distances. The cost must be weighed with the appropriate business risk.
A mirror provides real-time redundancy, and when it's active, isn't a frozen image or snapshot. The mirror can be temporarily suspended - also referred to as a broken or split-mirror - to create a snapshot or point-in-time copy. The disk subsystem is told to temporarily stop making updates to the mirrored copy so the data is frozen at the point of the suspension. The split-mirror can then be used for the backup process or other purposes.
Mirrors create an instant copy, or snapshot, of the data with the split capability. Unlike copy-on-write, a full data copy is available. In order to keep the disaster recovery copy available, a third mirror is usually established for the purpose of splitting. This requires three entire copies of the data volume to provide the protection and meet continuous processing for backup and other development needs. In this setup, there is a primary and secondary real-time copy, and a tertiary point-in-time copy of the data. Since the split-mirror or snapshot is an entire set of the data, the data can be updated for development or training purposes. In contrast to copy-on-write products, only some of the products allow writes to occur to the snapshot copy. Note that once the split-mirror has been modified, the entire mirror must be rewritten to establish an active mirror with the original volume.
When the backup is complete, the mirror is resumed. In more sophisticated systems, the writes have been saved from the point of the suspension and they are applied to the mirror and normal mirror operations resume. However, in some products, the entire mirror must be rewritten once the mirror is broken. In either case, the original data volume is not affected by breaking the mirror.
Products that utilize the split-mirror to provide an instant copy or snapshot, include:
- EMC TimeFinder (Symmetrix)
- Hitachi InstantSplit for ShadowImage (Lightning series)
- HP SureStore Business Copy (XP)
- Sun StorEdge Instant Image (9900 systems)
- Xiotech REDI (Magnitude)
- LSI Logic ContinuStor Director
- Veritas Volume Manager and Volume Replicator
Mirrors have a write penalty while the mirror is active. The write penalty ceases once the mirror is split, or when the snapshot is created. In contrast to copy-on-write, the write penalty begins after the snapshot is taken. In either case, the snapshot is available in an instant and doesn't require the user to wait for the copy to get created.
When to use copy-on-write vs. split-mirror
Copy-on-write is an efficient method of creating an image. Since it utilizes much less disk space, multiple copies can be kept as restore points or for other purposes, such as testing and training. Continuous development based on this technology will provide simpler and closer to the problem recovery points. Eventually, the user will select the source restore time and a file or volume will be available from that point-in-time. Although disk space is becoming much more economical and scalable, replicating large data sets is still expensive. Copy-on-write snapshots minimize disk space considerations, saving time and data with multiple roll-back copies. Some copy-on-write implementations don't allow the snapshot to be used as a source image that can be written to, so these implementations are limited to backup and recovery purposes.
Split-mirror leverages mirroring technology that's increasingly being deployed for disaster recovery purposes. Mirroring can be costly, but may be appropriate protection from disk subsystem failures and disasters. The ability to split the mirror in environments already utilizing mirroring is a natural extension. Another option is to take a copy-on-write snapshot of an active mirror in order to save on disk space and allow more point-in-time images. The split-mirror is effective for rapidly churning data, since there isn't a write penalty after the mirror is split. Since the write penalty is already completed prior to the snapshot, the split-mirror is effective for copies that will be utilized for a long time - for example, an extensive data mining deployment.