Snapshot backup software can be a cheaper and more trouble-free way to manage your data backup process vs. traditional
data backup and recovery software. Also called near-continuous data protection (or near-CDP), snapshot backups are good if you need recovery point objectives (RPOs) measured in seconds or minutes.
In this tutorial on snapshot backup software, learn about near-CDP and snapshots, redirect-on-write snapshots, and how to restore snapshot data.
SNAPSHOT BACKUP SOFTWARE TUTORIAL: TABLE OF CONTENTS
Near-CDP systems, which do snapshot and replication-based backups, are very efficient because, like CDP systems, they only transfer and store new blocks of data. The changes (or "deltas") can be easily stored on the primary system and replicated to a secondary system for backup. Snapshots must be replicated or backed up to tape because they depend on the primary volume for their data.
The true value of a near-CDP system is demonstrated during operational recovery. Their RPO can be measured in minutes, and their recovery time objective (RTO) is measured by how long it takes you to point the application from the primary storage system to the secondary storage system.
Just when is "continuous" not really continuous? In the real world, the definition is pretty straightforward, but when it comes to CDP and near-CDP, things get a little hazy. The Storage Networking Industry Association (SNIA), a leading storage vendors' trade organization, offers its own definition of CDP:
"Continuous data protection [CDP] is a methodology that continuously captures or tracks data modifications and stores changes independent of the primary data, enabling recovery points from any point in the past."
A true CDP system captures every change or new piece of data as soon as it's committed to disk and immediately replicates that change to another system. Near-CDP performs a similar function but does it periodically -- every 15 minutes, 30 minutes, hour, etc. -- so it's truly not "continuous" at all. Because it offers RTOs and recovery point objectives (RPOs) that are very close to what CDP offers, many people refer to it as near-CDP; however, the term "near-CDP" isn't officially recognized by SNIA.
If you're using one of the leading backup software products, you should ask the vendor how they accomplish near-CDP functionality. Some of them provide it completely within their product, but most accomplish it by controlling and reporting on the snapshot replication capabilities of a storage or virtualization system.
If you're going to rely entirely on snapshots for historical preservation of data, you need to ensure that the existence of dozens or hundreds of snapshots doesn't negatively impact the performance of your storage system. Therefore, the feasibility of a snapshot-based backup system depends highly on what type of snapshots your storage system provides.
The most common type of snapshot is the copy-on-write snapshot. A copy-on-write snapshot system copies a block before it overwrites it with a new block. Typically, the previous version of the block is copied to another volume, which has the advantage of leaving the structure of the source volume unchanged. One would think this would have performance advantages, but the opposite is true. That's because each write requires three separate I/O operations: a read of the previous block, a write of the previous block and a write of the new block. Over time, this can create quite a performance degradation on the primary storage system, which is why it's extremely rare to use a copy-on-write snapshot system for this purpose. Typically, copy-on-write snapshots are only used to create a stable image as a source for another backup system. This can be a traditional backup system that copies the snapshotted volume to a backup system or a more advanced system that replicates the snapshot to another storage system. If the snapshots are replicated, this allows you to leave very few snapshots on the primary system with all previous snapshots stored on the secondary system. This has the effect of minimizing the performance impact of the snapshots on the primary system while maintaining historical versions for operational recovery. If you're using a copy-on-write storage system and wish to move to a near-CDP-style backup, you'll need to adopt one of the approaches that allows you to limit the number of snapshots on the primary volume.
Redirect-on-write is a less common type of snapshot that writes the new block in a new location, leaving the previous version of the block in its place. The advantage of this approach is that it requires only one I/O operation to update a block (as opposed to three I/O operations with copy-on-write). This is why storage systems using this style of snapshot can store dozens or hundreds of snapshots without a significant degradation in performance. And it's precisely that feature that makes redirect-on-write-style snapshots the preferred snapshot method to use for a near-CDP backup system.
There are two disadvantages to the redirect-on-write-style snapshots. The first is that all blocks -- both current and all previous versions -- are stored in the same volume. Over time, this can cause the current versions of the blocks that comprise a given volume to become fragmented. Be sure to consult with the vendor whose product you're considering to see how they deal with this fundamental design issue of redirect-on-write volumes.
The second, and much more dangerous, disadvantage of redirect-on-write snapshots is that the historical versions of blocks can cause the volume to become full, stopping all further writes to the volume until the issue is corrected. Copy-on-write systems avoid this issue by storing the historical versions of blocks in a different volume. If the history volume becomes full, it only stops updating the snapshots -- the current version of the volume is unaffected. However, redirect-on-write snapshots must keep the current and historical blocks in the same volume, creating the risk of filling up the volume with historical blocks. This is why users who opt for this approach to snapshots on their volumes must keep extra space in reserve, and must constantly monitor the volumes to ensure there's enough reserve space to keep up with the level of changes of any given volume. The more blocks change and the more frequently they're changed, the more space you're going to need for snapshots.
Vendors that don't offer redirect-on-write snapshots often use these disadvantages as FUD (fear, uncertainty and doubt) when talking to potential customers. Don't believe the FUD, but consider it a source of information that must be verified.
The first potential disadvantage (fragmentation) is easy to test for in a proof-of-concept test: Test the performance before/after the creation of dozens or hundreds of snapshots -- after updating thousands of blocks, of course.
The second potential disadvantage is a very real one and simply must be monitored. If you run out of space because of your snapshot data, your volume will stop updating and your application will crash. If you're not experienced with this type of snapshot, follow the vendor's most conservative estimates on how much space to keep in reserve. Over time, monitoring how much space is taken up by snapshots should allow you to develop a much better estimate that's more appropriate for your environment. If you monitor things properly, the worst that should ever happen is that you have to delete more snapshot history than you would to make sure your volume doesn't stop functioning (see "Sampler: Storage systems with redirect-on-write snapshots.").
Any type of structured data requires special treatment before creating a snapshot of the volume it's stored on. At a minimum, without this special attention, your app will go into crash recovery mode after recovery and possibly cause a given snapshot to be completely worthless for restore. Therefore, be sure to research the proper way to prepare your application prior to creating a snapshot.
Windows solves this problem using Volume Shadow Copy Service (VSS). A backup system that's about to create a snapshot of a volume simply needs to communicate its intention to VSS. (To do this, it must be capable of being a VSS requestor.) VSS then provides the requestor a list of applications for which it requires VSS intervention prior to taking snapshots. The requestor then communicates with each application's VSS writer. Once an application has been prepared for the snapshot, the requestor asks VSS to create the snapshot. VSS then informs the VSS snapshot provider to create the snapshot. (The snapshot provider can be Windows itself, or a storage or virtualization system like those discussed earlier.) Once the snapshot has been successfully created, the requestor can inform the supported applications (via its VSS writer) that they have been backed up, which allows them to do things like truncate their transaction logs.
Unfortunately, VSS functionality (or any meaningful equivalent) doesn't exist for Unix- or Linux-based operating systems. So if you plan to use snapshots with Unix systems, you'll need to use an application that can accomplish the same steps, or you'll have to write a script that communicates directly with the applications.
There are a number of ways to do restores with near-CDP systems. The most common is to make the historical versions of files available as a subdirectory underneath the originating directory. When a previous version of a file is needed, a user can simply point their file browser to the appropriate directory, locate the file, and then copy and paste it.
Another type of restore happens when a user is looking for a file and isn't quite sure where or when it was last seen. This type of restore is very easy to do in traditional backup products because they have a database that tracks the location of all files and all versions of those files. However, most near-CDP storage systems don't have a similar capability. It's one reason why many companies use their traditional backup product to configure, schedule and report on their near-CDP backups. Depending on the capabilities of your backup product, it can create a catalog of all snapshots it's controlling, allowing you to search this catalog during restores.
The most valuable type of restore a near-CDP system can perform is when you lose an entire volume or a directory containing the virtual disk volumes that comprise a virtual machine (VM). While in most cases it must be performed manually, it's a relatively simple process to point NFS or CIFS clients to a different server, or to mount a VM from a different location. This is when a near-CDP system truly pays for itself because it allows you to perform this "restore" in a few moments, rather than several hours. Once the problem with the production volume has been corrected, you can do a reverse restore from the backup system to the primary system and revert back to the primary system once that restore has been completed. After you've done this type of restore once, you won't want to go back to the "old days."
Finally, it's critical to monitor and report on the success/failure of your near-CDP backups. This functionality may be provided by your storage vendor, but it's most likely provided by your backup software vendor and their partners. This is another reason why you should consider controlling your near-CDP backups via your backup software product, even if all it's doing is acting as a traffic cop. Having all your backup functionality in one place is a good thing.
Don't embark on a near-CDP backup project hastily. Check out your vendor's capabilities and perform a proof-of-concept test before signing any purchase orders. And make sure all the good things about your current backup system -- centralized scheduling, cataloging, monitoring and reporting -- don't disappear when you deploy your shiny new near-CDP system.
About this author: W. Curtis Preston is an independent backup expert. Curtis has worked extensively with data deduplication and other data reduction systems.
This article was previously published in Storage magazine.