In the first part of this multi-part primer on backup, we looked at tape drives, tape libraries and virtual tape libraries. But those aren't the only options available for enterprise users when backing up files. There are also deduplication arrays, integrated backup appliances and, of course, cloud backups.
This section looks at each of those options in more detail, starting with deduplication arrays.
Although disk-based storage is superior to tape in many ways, disk-based backup targets are not without their disadvantages. One of the few advantages that tape has over disk is capacity. Tape is a removable media and therefore provides a nearly unlimited storage capacity. Any time that an organization needs to store more data, it can simply purchase more tapes.
This isn't the case for storage arrays. Although it may sometimes be possible to increase a disk array's capacity by installing larger disks, doing so tends to be expensive and somewhat disruptive. Disk arrays will eventually reach the point at which their capacity can no longer be increased. This can be a big problem for organizations that must protect a data set that is growing exponentially. One of the most common methods for overcoming these types of capacity limitations is through the use of deduplication arrays.
The type of data being stored ultimately determines the effectiveness of a deduplication array.
Deduplication allows these arrays to achieve far greater logical storage capacities than would otherwise be possible when backing up files. EMC says its Data Domain Global Deduplication Array, for example, provides up to 14.2 petabytes of logical backup capacity.
While the idea of having a backup storage array that is capable of storing such vast quantities of data probably sounds promising, there are a couple of things that must be considered prior to investing in a deduplication array.
First, a deduplication array's stated logical capacity is a best-case estimate. Real-world deployments often deliver capacities that are considerably lower than the stated maximum logical capacity. The reason for this has to do with the way that the deduplication process works.
Although there are a number of different ways to perform deduplication, the deduplication process is usually block-based (although some arrays can deal with partial blocks). Deduplication is based on the idea that each block of data only needs to be stored once. As such, the deduplication engine removes storage redundancy by eliminating duplicate copies of storage blocks.
This type of deduplication is very effective for virtual machine backups. For instance, if an organization has a large number of virtual machines that are all running a common operating system, then the deduplication engine will be able to eliminate blocks associated with duplicate operating system files.
However, some other types of data do not deduplicate very well. Scientific data, for example, tends not to have a lot of redundancy and therefore does not deduplicate well. Similarly, compressed files such as ZIP files or media files such as MP3 or MP4 tend not to deduplicate. As such, it is the type of data being stored that ultimately determines the effectiveness of a deduplication array.
Another consideration that must be taken into account when considering the purchase of a deduplication array is that of redundancy. Like other types of storage arrays, deduplication arrays typically include redundant disks that protect data against disk failures. Even so, the appliance itself can become a single point of failure. In order to protect against an appliance-level failure, it may be necessary to purchase a second appliance and replicate data between the two appliances or periodically back up an additional copy of data to tape.
Integrated backup appliance
Integrated backup appliances are designed to act as an "all-in-one" backup solution. A single appliance might contain backup software, a backup server and target storage. Integrated backup appliances such as the NetBackup 5230 or the Backup Exec 3600 are designed to provide busy administrators with a turnkey backup solution. The appliances are designed to be simple to deploy and easy to use. Often times, an integrated solution also costs less than it might cost to purchase all of the components and software licenses separately.
Integrated backup appliances also offer the added benefit of easy troubleshooting when things go wrong when backing up files. Because all of the components are provided by a single vendor and are specifically designed to work together, troubleshooting problems can be relatively easy.
The biggest drawback to using integrated backup appliances is vendor lock-in. While there is nothing inherently wrong with using a single vendor's products, some administrators find that vendor lock-in reduces flexibility. For example, suppose you want to use System Center Data Protection Manager (DPM) to protect your Hyper-V servers. There would be nothing stopping you from running parallel backup solutions, but you probably would not be able to use an integrated backup appliance as a DPM target.
Another disadvantage to using integrated backup appliances is that they make it very difficult to transition to another vendor's appliance. If you outgrow your integrated backup appliance and want to use a backup product from another vendor, then there probably won't be a clear migration path.
The term cloud backup gets used a lot, but in reality cloud backups fall into two main categories. The first is backup as a service (BaaS). BaaS is based around the use of a Web-based backup application.
Although there are enterprise-grade BaaS providers, the vast majority of BaaS software is geared toward consumers. Many BaaS products tend to suffer from sluggish performance and a lack of application awareness.
The other major form of cloud backup is based on the idea of using cloud storage as a backup target. These types of cloud backups tend to be supplementary. Most products create backups locally and then deduplicate and replicate the backup contents to cloud storage.
The primary disadvantage to cloud backups is that they provide a degree of separation between an organization and its backup data. In the event of a major natural disaster, an organization might not have Internet connectivity, which would be required to restore a backup from the cloud. Even if such connectivity did exist, its speed would directly impact the amount of time it takes to restore the backup.
As a best practice, organizations that want to use cloud backups should treat their cloud backup as a secondary, off-premises backup copy rather than as a primary backup.
As you can see, backing up files today hardly resembles the way it was in the not-too-distant past. When choosing a modern way to back up files, remember that every kind of backup has advantages and disadvantages that must be considered.
Check out our entire data protection primer on identifying data backup solutions for today's challenges.