Backup refers to the copying of physical or virtual files or databases to a secondary site for preservation in case of equipment failure or other catastrophe. The process of backing up data is pivotal to a successful disaster recovery (DR) plan.
What is backup and recovery?
Enterprises back up data they deem to be vulnerable in the event of buggy software, data corruption, hardware failure, malicious hacking, user error or other unforeseen events. Backups capture and synchronize a point-in-time snapshot that is then used to return data to its previous state.
Backup and recovery testing examines an organization's practices and technologies for data security and data replication. The goal is to ensure rapid and reliable data retrieval should the need arise. The process of retrieving backed-up data files is known as file restoration.
The terms data backup and data protection are often used interchangeably, although data protection encompasses the broader goals of business continuity, data security, information lifecycle management, and the prevention of malware and computer viruses.
What data should be backed up and how frequently?
A backup process is applied to critical databases or related line-of-business applications. The process is governed by predefined backup policies that specify how frequently the data is backed up and how many duplicate copies (known as replicas) are required, as well as by service-level agreements (SLAs) that stipulate how quickly data must be restored.
Best practices suggest a full data backup should be scheduled to occur at least once a week, often during weekends or off-business hours. To supplement weekly full backups, enterprises typically schedule a series of differential or incremental data backup jobs that back up only data that has changed since the last full backup took place.
Backup storage media
Enterprises typically back up key data to dedicated backup appliances or magnetic tape systems. Data deduplication systems contain hard disk drives (HDDs) and are equipped with software for setting backup policies.
Disk-to-disk backup systems initially appeared as an alternative to magnetic backup tape drive libraries. Both disk and tape are still used today, and often in conjunction.
As file sizes have increased, some backup vendors have brought integrated data protection appliances to market in an effort to simplify the backup process. An integrated data appliance is essentially a file server outfitted with HDDs and vendor-developed backup software. These plug-and-play data storage devices often include automated features for monitoring disk capacity, expandable storage and preconfigured tape libraries.
Most disk-based backup appliances allow copies to be moved from spinning media to magnetic tape for long-term retention. Magnetic tape systems are still used as backup media due to increasing tape densities and the rise of linear tape file systems.
Solid-state drives (SSDs) generally are not used for data backup because of endurance concerns. Some storage vendors include SSDs as a caching or tiering tool for managing writes with disk-based arrays. Data is initially cached in flash storage and then written to disk.
Local backup vs. offline backup for primary storage
Modern primary storage systems have evolved to feature stronger native capabilities for data backup. These features include advanced RAID protection schemes, unlimited snapshots, and tools for replicating snapshots to secondary backup or even tertiary off-site backup. Despite these advances, primary storage-based backup tends to be more expensive and lacks the indexing capabilities found in traditional backup products. Data deduplication, for example, first appeared in EMC Data Domain backup appliances but is gradually becoming a baseline feature of branded, primary storage arrays.
Local backups place data copies on external HDDs or magnetic tape systems, typically housed in or near an on-premises data center. The data is transmitted over a secure high-bandwidth network connection or corporate intranet.
One advantage of local backup is the ability to back up data behind a network firewall. Local backup is also much quicker and provides greater control over who can access the data.
Offline or cold backup is similar to local backup, although it is most often associated with backing up a database. An offline backup incurs downtime since the backup process occurs while the database is disconnected from its network.
Backup and cloud storage
Conversely, off-site backup transmits data copies to a remote location, which can include a company's secondary data center or leased colocation facility. Increasingly, off-site data backup equates to subscription-based cloud storage as a service, which provides low-cost, scalable capacity and eliminates the customer's need to purchase and maintain backup hardware. Despite its growing popularity, electing backup as a service requires users to encrypt data and take other steps to safeguard data integrity.
Cloud backup is divided into the following:
- Public cloud storage: Users ship data to a cloud services provider, which charges them a monthly subscription fee based on consumed storage. There are additional fees for ingress and egress of data. Amazon Web Services, Google Compute Engine and Microsoft Azure are currently the largest public cloud providers.
- Private cloud storage: Data is backed up to different servers within a company's firewall, typically between an on-premises data center and a secondary DR site. For this reason, private cloud storage is sometimes referred to as internal cloud storage.
- Hybrid cloud storage: A company uses both local and off-site storage. Enterprises customarily use public cloud storage selectively for data archiving and long-term retention. They use private storage for local access and backup for faster access to their most critical data.
Expert George Crump discusses data backup and issues that can occur when backing up to the cloud.
Most backup vendors enable local applications to be backed up to a dedicated private cloud, effectively treating cloud-based data backup as an extension of a customer's physical data center. Also known as disaster recovery as a service, this maturing field allows an organization to lease space on a service provider's storage servers for centralized backup and management of lifeline data.
Cloud-to-cloud data backup is an alternative approach that has been gaining momentum. Using this method, a customer's data is copied from one cloud backup platform to another cloud. It also refers to cloud-based backups of data stored on software-as-a-service platforms.
Backup storage for PCs and mobile devices
PC users can consider both local backup from a computer's internal hard disk to an attached external hard drive or removable media such as a thumb drive. Another alternative for consumers is to back up data on smartphones and tablets to personal cloud storage, which is available from vendors such as Box, Carbonite, Dropbox, Google Drive, Microsoft OneDrive and others. These services are commonly used to provide a certain capacity for free, giving consumers the option to purchase additional storage as needed. Unlike enterprise cloud storage as a service, these consumer-based cloud offerings generally do not provide the level of data security businesses require.
Backup software and hardware vendors
Vendors that sell backup hardware platforms include Barracuda Networks, Dell, Drobo, EMC (Data Domain), ExaGrid Systems, Hewlett Packard Enterprise, Hitachi Data Systems (including Sepaton), IBM, NEC Corp., NetApp, Oracle Storage Tek (tape libraries), Quantum Corp., Spectra Logic, Unitrends and Veritas NetBackup (formerly Symantec NetBackup).
Leading enterprise backup software vendors include Acronis, Arcserve, Asigra, Commvault, Datto, Druva, EMC Data Protection Suite (Avamar, Data Protection Advisor, Mozy, NetWorker and SourceOne), EMC RecoverPoint replication manger, Nakivo and Veeam Software.
The Microsoft Windows Server operating system inherently features the Microsoft Resilient File System (Microsoft ReFS) to automatically detect and repair corrupted data. While not technically data backup, Microsoft ReFS is geared to be a preventive measure for safeguarding file system data against corruption.
VMware vSphere provides a suite of backup tools for data protection, high availability and replication. The VMware vStorage APIs for Data Protection (VADP) allows VMware or supported third-party backup software to safely take full and incremental backups of virtual machines (VMs). VADP implements backups via hypervisor-based snapshots. As an adjunct to data backup, VMware vSphere live migration allows VMs to be moved between different platforms to minimize the impact of a DR event. VMware Virtual Volumes also figure to aid VM backup.
A backup robot is an automated USB 2.0 external storage device that supports multiple removable Serial ATA hard drives. The first instance of a digital backup robot was introduced by Drobo, then operating as Data Robotics. Rather than use a robotic arm to manipulate hardware, the backup robot would automatically format and distribute data between the various hard drives inside of it, using storage virtualization technology to back up each drive to the other drives.
Software features have largely replaced the mechanical robotics in tape archive and backup systems.
Backup types defined
Full backup captures a copy of an entire data set. Although considered to be the most reliable backup method, performing a full backup is time-consuming and requires a large number of disks and/or tapes. Most organizations run full backups only periodically.
Incremental backup offers an alternative to full backups by backing up only the data that has changed since the last full backup. The drawback is that a full restore takes longer if an incremental-based data backup copy is used for recovery.
Differential backup copies data changed since the last full backup. This enables a full restore to occur more quickly by requiring only the last full backup and the last differential backup. For example, if you create a full backup on Monday, the Tuesday backup would, at that point, be similar to an incremental backup. Wednesday's backup would then back up the differential that has changed since Monday's full backup. The downside is that progressive growth of differential backups tends to adversely affect your backup window. A differential backup spawns a file by combining an earlier complete copy of it with one or more incremental copies created at a later time. The assembled file is not a direct copy of any single current or previously created file, but rather synthesized from the original file and any subsequent modifications to that file.
Synthetic full backup is a variation of differential backup. In a synthetic full backup, the backup server produces an additional full copy, which is based on the original full backup and data gleaned from incremental copies.
Incremental-forever backups minimize the backup window while providing faster recovery access to data. An incremental-forever backup captures the full data set and then supplements it with incremental backups from that point forward. Backing up only changed blocks is also known as delta differencing. Full backups of data sets are typically stored on the backup server, which automates the restoration.
Reverse-incremental backups are changes made between two instances of a mirror. Once an initial full backup is taken, each successive incremental backup applies any changes to the existing full. This essentially generates a novel synthetic full backup copy each time an incremental change is applied, while also providing reversion to previous full backups.
Hot backup, also known as dynamic backup, is applied to data that remains available to users as the update is in process. This method sidesteps user downtime and productivity loss. The risk with hot backup is that, if the data is amended while the backup is under way, the resulting backup copy may not match the final state of the data.
Techniques and technologies to complement data backup
Continuous data protection (CDP) refers to layers of associated technologies designed to enhance data protection. A CDP-based storage system backs up all enterprise data whenever a change is made. CDP tools enable multiple copies of data to be created. Many CDP systems contain a built-in engine that replicates data from a primary to a secondary backup server and/or tape-based storage. Disk-to-disk-to-tape backup is a popular architecture for CDP systems.
Near-continuous CDP takes backup snapshots at set intervals, which are different from array-based vendor snapshots that are taken each time new data is written to storage.
Data reduction lessens your storage footprint. There are two primary methods: data compression and data deduplication. These methods can be used singly, but vendors often combine the approaches. Reducing the size of data has implications on backup windows and restoration times.
Disk cloning involves copying the contents of a computer's hard drive, saving it as an image file and transferring it to storage media. Disk cloning can be used for provisioning, system provisioning, system recovery, and rebooting or returning a system to its original configuration.
Erasure coding, also known as forward error correction, evolved as a scalable alternative to traditional RAID systems. Erasure coding most often is associated with object storage. RAID stripes data writes across multiple drives, using a parity drive to ensure redundancy and resilience. The technology breaks data into fragments and encodes it with other bits of redundant data. These encoded fragments are stored across different storage media, nodes or geographic locations. The associated fragments are used to reconstruct corrupted data, using a technique known as oversampling.
Flat backup is a data protection scheme in which a direct copy of a snapshot is moved to low-cost storage without the use of traditional backup software. The original snapshot retains its native format and location; the flat backup replica gets mounted, should the original become unavailable or unusable.
Mirroring places data files on more than one computer server to ensure it remains accessible to users. In synchronous mirroring, data is written to local and remote disk simultaneously. Writes from local storage are not acknowledged until a confirmation is sent from remote storage, thus ensuring the two sites have an identical data copy. Conversely, asynchronous local writes are considered to be complete before confirmation is sent from the remote server.
Replication enables users to select the required number of replicas, or copies, of data needed to sustain or resume business operations. Data replication copies data from one location to another, providing an up-to-date copy to hasten disaster recovery.
Recovery in-place, or instant recovery, allows users to temporarily run a production application directly from a backup VM instance, thus maintaining data availability while the primary VM is being restored. Mounting a physical or VM instance directly on a backup or media server can hasten system-level recovery to within minutes. Recovery from a mounted image does result in degraded performance, since backup servers are not sized for production workloads.
Storage snapshots capture a set of reference markers on disk for a given database, file or storage volume. Users refer to the markers, or pointers, to restore data from a selected point in time. Because it derives from an underlying source volume, an individual storage snapshot is an instance, not a full backup. As such, snapshots do not protect data against hardware failure.
Snapshots are generally grouped in three categories: changed block, clones and CDP. Snapshots first appeared as a management tool within a storage array. The advent of virtualization added hypervisor-based snapshots. Snapshots may also be implemented by backup software or even via a VM.
Copy data management and file sync and share
Tangentially related to backup is copy data management (CDM). This is software that provides insight into the multiple data copies an enterprise might create. It allows discrete groups of users to work from a common data copy. Although technically not a backup technology, CDM allows companies to efficiently manage data copies by identifying superfluous or underutilized copies, thus reducing backup storage capacity and backup windows.
File sync-and-share tools protect data on mobile devices used by employees. These tools basically copy modified user files between mobile devices. While this protects the data files, it does not enable users to roll back to a particular point in time should the device fail.
Data backup: Variations on a theme
When deciding which type of backup to use, you need to weigh several key considerations. It is not uncommon for an enterprise to mix various data backup approaches, as dictated by the primacy of the data. Your backup strategy should be governed by the SLAs that apply to an application, with respect to data access/availability, recovery time objectives and recovery point objectives. Your choice of backups also is influenced by the versatility of your backup application.