Problem solve Get help with specific problems with your technologies, process and projects.

Using storage snapshot technologies as part of your data backup strategy

Storage snapshots provide a fast method of providing a copy of your data in case a file is lost or deleted. For that reason, they are becoming a popular as a way of handling some backups.

Storage snapshots provide a fast method of providing a copy of your data in case a file is lost or accidentally deleted. For that reason, they are becoming increasingly popular as a way of handling first-tier data backups and as an important backup strategy.

Modern backups fall into two categories. There's the copy for fast recovery of lost or corrupted files, which is where snapshots excel, and the true backup, which saves multiple copies of important files taken at different times for longer periods of time, sometimes several months. There are also archival copies. Snapshots are a good solution for the first, fast-recovery tier.

Snapshots work by tracking blocks of data which are changed and updating them to another copy of the data at user-determined intervals. Data can be updated with every write (called continuous data protection, or CDP) or at periodic intervals, depending on the needs of the user. A desktop may be adequately protected if it's updated every hour or even every day. A busy transactional database may require updating every few minutes or even continuously.

Popular snapshot technologies today

Some of the more popular snapshot technologies include:

Copy on write: When a block is written, copy on write copies the block to be overwritten to the snapshot storage area before it writes the new block into the place occupied by the old block. But this is really more than just snapshotting, it's CDP.

Redirect on write: When a block is written it goes to a new spot on the disk and the snapshot table is updated to show the old block as part of the snapshot. The old block isn't moved, saving a write operation. This is also CDP.

Split mirror: In a split mirror, all transactions are mirrored by being written at least twice. To recover a block, the mirror is split into two identical copies and the block is read from the split mirror. This is typically CDP.

Copy on write with background copy: This works like copy-on-write snapshots with the addition that the system copies the entire contents of the disk -- everything that hasn't changed -- to the snapshot area in the background. Because the copying of the unchanged data is done in the background, performance isn't affected.

Timing your snapshots

How often you choose to take a snapshot tends to be a balancing act. It is generally determined by the need for up-to-the-second copies versus the overhead of making a copy. The main technical factors, however, are how often the snapshot app can take snaps and how many snaps can be maintained at one time. (Overhead in this case refers more to resources such as bandwidth on the network and storage space.)

Point-in-time (PIT) snapshots is the time of the last snapshot before failure. Most systems let you keep a number of snapshots in case the last one is corrupted.

Any point in time (APIT) refers to the ability to recover from any point, or nearly any point. The term generally refers to a system which can recover to any arbitrary point in the past, although it is sometimes stretched to include systems which can recover to within minutes of any given point (CDP).

Products using snapshot technology today

There are several different strategies employed by vendors to produce snapshots. IBM Corp. FlashCopy and Linux Logical Volume Manager use copy-on-write snapshots. NetApp Filer uses redirect on write. AIX Logical Volume Manager and EMC Corp. Symmetrix support split-mirror snapshot. FlashCopy also supports copy-on-write snapshots with background copy.

One of the key questions with any snapshot or continuous data protection system is "how much granularity do you need?" In other words, how close to the point of failure does your recovery have to be? This is your recovery point objective (RPO), and is a critical factor in designing your backup operation.

It's an important question because the finer the granularity, the more expensive the solution. Modern systems can recover to any point in time (APIT), but the truth is that very few businesses need that kind of recovery. The cost difference between APIT and, say, once an hour, or even five minutes, can be significant.

It's also important to realize that while snapshots are a good way to provide fast recovery for things like accidental file deletions, they are no substitute for true backups. For one thing, methods like copy-on-write snapshots and redirect-on-write snapshots need access to the original copy of the data to reconstruct missing or damaged files. Split mirror doesn't and copy on write with background copy may or may not need the original data, depending on whether the background copy has been completed. You should also keep in mind that while you can mitigate the problems by keeping multiple point-in-time copies of the data, you still need conventional multiple backups for strong data protection.

About the author:
Rick Cook specializes in writing about issues related to storage and storage management.

Let us know what you think of this tip. Email the editors to talk about writing for SearchDataBackup.

Next Steps

Troubleshooting data backup error log messages

Using custom data backup scripts in a modern backup environment

How to choose an enterprise-ready virtual tape library

Ten questions to ask your online data backup provider

Dig Deeper on Backup and recovery software