Backup in a snap: A guide to snapshot technologies

Snapshots are used to enhance backup systems and shorten RTOs and RPOs. But you need to know how snapshots can vary, and what those differences could mean to your environment.

Snapshot technologies are commonly used to enhance data backup systems and dramatically shorten RTOs and RPOs. But you need to know how snapshot implementations can vary, and what those differences could mean to your environment.

A snapshot is commonly defined as a copy of a set of files, directories and/or volumes as they were at a particular point in time. As its name suggests, a snapshot is very much like a photograph because it captures an image of a certain set of data at a specific moment or point in time.

Snapshot technology was originally architected to solve several data backup problems, including:

  • Backing up data that's too large to complete in the allocated time
  • Failing to back up data because it has moved from a directory that hasn't been backed up to one that already has
  • Corruption of backed up data that can occur when it's being written to while it's being backed up
  • The affect on application performance while a backup is in process

All of these backup problems can be resolved with snapshots. But snapshots shouldn't be considered a backup panacea. There are some issues with snapshots that require workarounds (see "Snapshot snafus," below).

Snapshot snafus

Snapshot problems can occur when structured data is involved, such as databases and applications built around databases like email, enterprise resource planning (ERP) or customer relationship management (CRM). Most snapshot technologies aren't integrated with structured data applications, so when a snapshot is executed the snapshot doesn't wait for the database to be quiesced, the cache to be flushed, writes completed, and index and metadata to be updated. If the snapshot is taken when data is in the cache, or before all of the updates are completed, the snapshot isn't crash consistent -- it's corrupted.

This is less of an issue for structured data applications running on a Windows server if the snapshot technology takes advantage of Windows Volume Shadow Copy Services (VSS) through its API. VSS is designed to specifically work with structured data applications, and it does all the heavy lifting of quiescing the database, flushing the cache, and completing the writes and updates before initiating the snap.

Unfortunately, there isn't an equivalent service or API in Linux or Unix operating systems. VMware Inc. has a partial solution via its vCenter storage API. The API will allow a snapshot technology to send a command to vCenter telling it to quiesce the virtual machine and then take a snapshot. At this time, it's not structured data application-aware so the snapshot may not be crash consistent.

But there's an excellent workaround for snapshotting structured data applications in a crash-consistency manner without using Windows VSS. The workaround requires backup software that integrates with the snapshot technology's API so it can leverage structured data application agents from the backup software. The agent quiesces the application, flushes the cache, completes the writes and updates, and then tells the backup software to notify the snapshot technology to perform the snapshot. This workaround is a relatively effective solution.

A series of steps are required to initiate a snapshot:

  1. It starts with a command that a backup is about to occur.
  2. This command tells the system to quiesce the file system and apps running at that point in time.
  3. The file system is then flushed so that any pending file transactions are completed.
  4. The snapshot is then created.
  5. Afterwards, the file system and applications are released to resume normal operations.

Snapshot technology has also moved beyond just data protection. Snapshots are an efficient and non-disruptive way to test application software against real data without endangering live production data. They're also ideal for data mining and ediscovery. Snapshots have also evolved into a very effective -- even preferred -- disaster recovery methodology that protects against malware, human errors and data corruption.

Where snapshot technology lives

The common perception may be that snapshotting is a storage system feature, but that's only one place that the technology may reside. Snapshot technologies are generally available in seven different types of implementations:

  1. File systems of servers, desktops and laptops
  2. Logical volume managers (LVMs)
  3. Network-attached storage (NAS)
  4. Storage arrays
  5. Storage virtualization appliances
  6. Server virtualization hypervisors
  7. SQL databases

File system-based snapshots

File system-based snapshots are available in Microsoft Corp.'s Windows NTFS via Volume Shadow Copy Services (Shadow Copy in Vista); Novell Storage Services (NSS) on NetWare 4.11 or better; Novell's OES-Linux in SUSE Linux; and the Zettabyte File System (ZFS) on Sun Microsystems Inc.'s Solaris and Apple Mac OS X 10.6 (Snow Leopard).

One of the advantages of file system-based snapshot is that it tends to be "free" because it comes with the file system. It also works well and the latest file systems make it pretty easy to use. On the downside, each file system must be managed separately, which can become onerous as the number of systems proliferates. It also means that if snapshot replication is required, each file system must be set up to replicate its own snapshots. In addition, different file systems will likely vary in the kinds of snapshots they provide; snapshot frequency; the amount of capacity that must be reserved (if capacity must be reserved); as well as snapshot set up, operations and manageability. The complexity increases as more servers and file systems must be managed.

LVM snapshots

LVM snapshot technology is available with Hewlett-Packard (HP) Co.'s HP-UX Logical Volume Manager, Linux Logical Volume Manager and Linux Enterprise Volume Management System; Microsoft's Logical Disk Manager for Windows 2000 and later; Sun Solaris 10 ZFS; and Symantec Corp.'s Veritas Volume Manager (part of Symantec Veritas Storage Foundation).

Logical volume manager snapshot technology can sometimes run across a number of file systems; for example, Symantec's Veritas Volume Manager can function with most common operating systems. LVMs also usually include storage multi-pathing and storage virtualization features.

When using LVMs, there are typically additional costs per server for license/maintenance fees. You may also confront the same issues of coordination and complicated implementations found with file system-based snapshots.

NAS snapshots

Network-attached storage is essentially an optimized or specialized file system running on an appliance or an appliance integrated with storage. Most midrange and enterprise-class NAS systems provide snapshot capabilities, including those with proprietary operating systems and the wide variety of NAS systems that are based on Microsoft Windows Storage Server.

There's a lot to like about NAS-based snapshotting, including a common standard for all of the physical and virtual servers, desktops and laptops that connect to the NAS device. It's also very easy to implement, operate and manage. NAS-based snapshot technology tends to be integrated with Windows Volume Shadow Copy Services (VSS), as well as with backup servers and their agents. Some NAS vendors have their own agents for non-Windows structured data applications. Other NAS snapshot offerings include data deduplication (EMC Corp., FalconStor Software Inc. and NetApp), and some even offer thin snapshot provisioning that minimizes the amount of storage reserved for snapshots.

But there's a price to pay for the convenience and added features: fairly hefty software licensing and maintenance charges that are often system or capacity based. NAS systems tend to proliferate in most companies and, as they do, the number of touchpoints required for snapshots will also increase, making operations and management more complex.

Storage array-based snapshots

Storage array-based snapshots are included with most block-storage array's operating systems.

The advantages of using snapshotting that comes with the storage array operating system are similar to those of NAS-based snapshots. They provide a common standard and touchpoint for all of the physical and virtual servers, desktops and laptops connected to the array, and are easy to implement, operate and manage. And, like NAS, many storage arrays integrate their snapshot technology with Windows VSS, as well as with backup servers and their agents. Some vendors even provide their own agents for non-Windows structured data applications.

The drawbacks include hefty license and maintenance fees, lack of integration with non-Windows-based structured data applications and increasing complexity as the number of storage systems increases.

Snapshots with storage virtualization appliances

Storage virtualization appliances are primarily SAN based with the exception of F5 Network Inc.'s Acopia ARX, which is file (NFS) based. Other examples of virtualization appliances (or storage systems that incorporate virtualization) include Cloverleaf Communication Inc.'s Intelligent Storage Networking System (iSN), DataCore Software Corp.'s SANsymphony and SANmelody, EMC's Celerra Gateway blades, FalconStor's IPStor, Hewlett-Packard's XP series, Hitachi Data Systems' Universal Storage Platform V/VM, IBM's SAN Volume Controller, LSI Corp.'s StoreAge Storage Virtualization Manager (SVM) and NetApp's V-Series storage controllers.

Storage virtualization approaches to snapshots have the same advantages as storage array- and NAS-based snapshots, but offer others as well. They provide a common standard and point of management for multiple storage systems from a single or several vendors, aggregating them into fewer or just one image. This greatly simplifies snapshot management, operations and training.

The negatives related to storage virtualization-based snapshots are a bit different. These devices will add some transaction latency, even those that have split-path architectures, which ultimately affects app response time. It also complicates troubleshooting and has the potential to exacerbate multivendor finger-pointing. And while the additional hardware or software comes with a price, it may be offset by lower software license or maintenance fees for the virtualized storage.

Snapshots with server virtualization hypervisors

The ascendancy of server virtualization has made hypervisor-based snapshot technology progressively more popular. This technology is available with virtualization software such as Citrix Systems Inc.'s XenServer, Microsoft's Hyper-V, Sun's xVM Ops Center, and VMware's ESX and vSphere4.

The advantages of using hypervisor-based snapshots are straightforward. The technology comes bundled with the hypervisor; it provides the same snapshot methodology for all virtual machines (VMs); it's integrated with Microsoft's VSS; and it's easy to implement, use and manage.

What's not to like about this approach? Snapshots must be managed separately for each hypervisor, and when snapshots are used for any OS other than Windows, only the entire VM will be imaged. That means restores are coarse grain and time consuming, and the snapshots aren't structured-data-aware outside of Windows and may produce non-consistent images.

Snapshots with SQL databases

In SQL databases, snapshotting is called "snapshot isolation." Snapshot isolation is required for databases such as Oracle and PostgreSQL to guarantee that all transactions are serializable and appear to be isolated and serially executed. Other SQL databases also support snapshot isolation but don't require it for serialization. In general, the SQL databases backup features take advantage of snapshot isolation to provide crash consistent dumps of tables.

The main advantage of using SQL database snapshot technology is that snapshots of the database, and any applications based on the database, will be crash consistent.

But there are some significant disadvantages. The snapshot technology is very limited and it only works with that particular database and the apps tied to it. It doesn't work with the file system, any other application on the server, or with other databases or servers. So you'll need other snapshot technologies or data protection, thus complicating operation and management.

Different types of snapshots and how they work

There are six general types of snapshot technologies:

  1. Copy-on-write
  2. Redirect-on-write
  3. Clone or split-mirror
  4. Copy-on-write with background copy
  5. Incremental
  6. Continuous data protection

Copy-on-write (COW) snapshot

COW requires storage capacity to be provisioned for snapshots, and then a snapshot of a volume has to be initiated using the reserved capacity. The COW snapshot stores only the metadata about where the original data is located, but doesn't copy the actual data at the initial creation. This makes snapshot creation virtually instantaneous, with little impact on the system taking the snapshot.

The snapshot then tracks the original volume paying attention to changed blocks as writes are performed. As the blocks change, the original data is copied into the reserved storage capacity set aside for the snapshot prior to the original data being overwritten. The original data blocks snapped are copied just once at the first write request. This process ensures snapshot data is consistent with the exact time the snapshot was taken, and it's why the process is called "copy on write."

Read requests to unchanged data are directed to the original volume. Read requests to changed data are directed to the copied blocks in the snapshot. Each snapshot contains metadata describing the data blocks that have changed since the snapshot was first created.

The major advantage of copy-on-write is that it's incredibly space efficient because the reserved snapshot storage only has to be large enough to capture the data that's changed. But the well-known downside to copy-on-write snapshot is that it will reduce performance on the original volume. That's because write requests to the original volume must wait to complete until the original data is "copied out" to the snapshot. One key aspect of copy-on-write is that each snapshot requires a valid original copy of the data.

Redirect-on-write (ROW) snapshot

Redirect-on-write is comparable to copy-on-write, but it eliminates the double write performance penalty. ROW also provides storage space-efficient snapshots like copy-on-write. What allows ROW to eliminate the write performance penalty is that the new writes to the original volume are redirected to the storage provisioned for snapshots. ROW redirection of new writes reduces the number of writes from two to one. So instead of writing one copy of the original data to the storage space plus a copy of the changed data required with COW, ROW writes only the changed data.

With redirect-on-write, the original copy contains the point-in-time snapshot data, and it's the changed data that ends up residing on the snapshot storage. There's some complexity when a snapshot is deleted. The deleted snapshot's data must be copied and made consistent back on the original volume. The complexity goes up exponentially as more snapshots are created, which complicates original data access, snapshot data and original volume data tracking, and snapshot deletion data reconciliation. Serious problems can occur when the original data set (upon which the snapshot is dependent) becomes fragmented.

Clone or split-mirror snapshot

A clone or split-mirror snapshot creates an identical copy of the data. The clone or split-mirror can be of a storage volume, file system or a logical unit number (LUN). The good thing about clones is that they're highly available. The bad thing is that because all of the data has to be copied, it can't be done instantaneously. A clone can be made instantaneously available by splitting a pre-existing synchronous volume mirror into two. However, when a split-mirror is used as a clone, the original volume has lost a synchronized mirror.

A very significant downside to this snapshot methodology is that each snapshot requires as much storage capacity as the original data. This can be expensive, especially if more than one snapshot clone is required to be kept live at any given time. One other downside is the impact to system performance because of the overhead of writing synchronously to the mirror copy.

Copy-on-write with background copy snapshot

Copy-on-write with background copy takes the COW instantaneous snapshot data and uses a background process to copy that data from its original location to the snapshot storage location. This creates a clone or mirror of the original data.

Copy-on-write with background copy attempts to take the best aspects of copy-on-write while minimizing its downsides. It's often described as a hybrid between COW and cloning.

Incremental snapshot

An incremental snapshot tracks changes made to the source data and snapshot data when the snapshot is generated. When an incremental snapshot is generated, the original snapshot data is updated or refreshed. There's a time stamp on the original snapshot data and on each subsequent incremental snapshot. The time stamp provides the capability to roll back to any point-in-time snapshot. Incremental snapshots allow you to get faster snapshots after the first one, and you use only nominally more storage space than the original data. This enables more frequent snapshots and longer retention of snapshots.

The downside to incremental snapshots is that they're dependent on the underlying baseline technology used in the first snapshot (copy-on-write, redirect-on-write, clone/split-mirror or copy-on-write with background copy). If cloned, the first snapshot will take a while; if COW, there will be a performance penalty on writes to the original data, etc.

Continuous data protection (CDP)

CDP was developed to provide zero data loss recovery point objectives (RPOs) and instantaneous recovery time objectives (RTOs). It's similar to synchronous data mirroring except that it eliminates the rolling disaster (a problem in the primary data is automatically a problem with the mirrored data long before human intervention can stop it) and protects against human errors, malware, accidental deletions and data corruption.

Continuous data protection is like incremental snapshots on steroids. It captures and copies any changes to the original data whenever they occur and time stamps them. It essentially creates an incremental snapshot for every moment in time, providing very fine-grain recoveries. Some CDP implementations are both time and event based (such as an application upgrade). A good way to think of CDP is as a journal of complete storage snapshots.

CDP is an excellent form of data protection for email, databases and applications that are based on databases. The ability to roll back to any point-in-time makes recoveries simple and fast. FalconStor's IPStor is an example of a storage system and/or virtualization appliance that provides CDP.

With more and more data to protect and often less time to do it, snapshots will play a bigger role in data protection and daily storage operations. Although the differences among snapshot technologies may seem subtle, how they operate in your environment could have a significant effect on the level of protection provided and how quickly recoveries can occur.

BIO: Marc Staimer is president of Dragon Slayer Consulting.

Dig Deeper on Data storage backup tools