How to implement disk-to-disk backups

Disk is being used more frequently in the data center because it offers better performance than tape systems, but there are limitations you should be aware of.

As capacities grow and costs per gigabyte shrink, disk is being used as a backup storage target more frequently. Disk offers far better performance than tape systems, allowing for faster data transfers as well as greater long-term reliability. Disk can accept complete backup volumes, as well as simple file copies. For example, backup software can send a full backup volume to a virtual tape library (VTL) emulating the behaviors of a traditional tape system. But disk platforms don't just emulate tape. Files can just as easily be copied to a disk array in the same data center or located on the other side of the world. Storage administrators can then recover individual files as needed without opening and searching an entire backup volume.

Of course, disk platforms also have limitations as backup targets. Storage space is finite, which can limit the allowable retention period for data. Data reduction technologies like data deduplication and delta differencing can dramatically extend the effective storage capacity. Also, indexing and search are can be valuable when rapid file organization and location are needed. Below are a series of best practices that can ease implementation of disk-to-disk (D2D) backup.

Preprocess files before moving

Processing a backup job takes time and demands significant work from the backup server. Some storage or backup administrators choose to process the entire backup job to the local server first, and then move the backup volume to the final storage platform. "Look for solutions that can reduce the amount of time pre-scanning your file systems before actually moving data like those from EMC Corp. [Avamar], Asigra Inc. and others," says Greg Schulz, founder and senior analyst at the StorageIO Group in Stillwater, Minn. The implications are mixed. Processing the backup job first will require additional storage on the backup server to hold the completed backup job until it can be moved. However, the completed backup job can be moved across the network to another storage platform in a fraction of the time that might otherwise be required.

Avoid the use of compression or backup sets

When dealing with D2D backups, it's often a preferred practice to backup in native file system formats and avoid the use of vendor-specific backup save sets or compression-centric formats like .zip or .tar. This may demand additional disk space, but eliminates the need for backup or compression software when retrieving those backup files later on. For example, if you need to restore a lost Word file from a disk backup, the file should be readily accessible in its native form. Otherwise you'd need to decompress the .zip or .tar package or restore the backup volume to locate the one item you need -- a troublesome and error-prone process.

Consider the use of removable hard drives

Where disk capacity is finite, a tape backup can be almost any size (within the limits of the operating system and backup software) -- just add more tape cartridges. Tapes can also be transported for offsite storage. A new generation of removable hard drives from vendors like Imation Corp., Prostor Systems Inc., Quantum Corp., and Iomega Corp. bring some of those traditional tape advantages to disk storage. For example, a disk filled with archival data can be removed from the storage platform, moved to a secure location if desired, and replaced with another fresh hard drive for virtually unlimited storage capacity.

Plan for scalability and limit the total number of systems

Disk storage should be implemented with a strong focus on long-term scalability in both performance and capacity. Storage volumes are constantly increasing, and all types of disk storage (not just disk backup) will need to grow over time -- not just to store more files, but also to meet increasing storage traffic demands. Scalability should also tie into any storage consolidation plans within the organization, using fewer and larger storage boxes to limit power consumption and stay within a reasonable energy footprint.

Include data deduplication technologies

Data deduplication has become an indispensable element of disk-based backup, and should be an integrated part of the storage platform itself. Deduplication systematically locates and eliminates redundant data stored within the disk array. This can effectively compress storage as much as 50:1 over time -- not only easing capital expenditures for new storage purchases, but allowing far greater retention periods for backup data.

Consider multi-purpose disk storage systems

Position the storage infrastructure for flexibility. Select and deploy storage platforms that can support multiple tiers internally, or that can provide support for nearline or archive storage. Implement the tools needed to classify and move data across tiers and storage systems. This allows organizations to prioritize backup efforts based on storage tiers or the relative importance of data to the organization. As an example, the data on Tier 1 storage may be backed up to local disk on a nightly basis, but data on lower tiers may only be backed up every few days. The result is shorter backup windows and less disk capacity committed to backup storage.

Expand D2D backup operation in phases

Deploy D2D platforms that allow your organization to phase in disk-based backup easily and cost-effectively. For example, start by backing up non-production data to disk, then systematically expand D2D coverage to archive and nearline data, and then finally use D2D to protect more critical production data over time. A phased approach allows backup administrators to become accustomed to the D2D paradigm, and work out any operational or procedural issues before dealing with mission-critical data.

Plan for data classification, indexing, and search features

Disk-based backup platforms should move to embrace technologies that include data classification, indexing, and search. Data classification is crucial in understanding your data and its relative importance to your enterprise. Indexing and search features allow you to find data into the future -- often in response to compliance regulations or e-discovery requirements. "Integration of these technologies with D2D solutions will enable organizations to better meet SLAs and to apply certain features to data," says Heidi Biggar, analyst with Enterprise Strategy Group.

Be sure to include remote replication support

As organizations grow and remote offices proliferate across regions and around the world, the need to replicate data from the data center and remote offices will become more pronounced. Consider the need for remote replication in your D2D backup platforms and examine vendors' support for remote replication across one or more locations. Replication may provide one-to-one, many-to-one, or one-to-many support.

Dig Deeper on Disk-based backup