Enterprise data backup software moves data from a primary storage device to a secondary storage device. In the past, that secondary device was a tape drive. Now, it is more typically a disk device designed specifically for backups. In the data center, backup software has evolved from an application that merely copies data to a highly sophisticated data protection application.
IT professionals should look for products that protect OSes or environments in their data centers and that have the ability to perform online backup of applications. They should also consider advanced recovery capabilities such as replication or rapid recovery.
It is tempting to take enterprise data backup software for granted and assume that all applications are basically the same. The reality is that the backup software market has matured, and there are stark differences between applications. Understanding those differences requires a grasp of the internals common to them.
The first component of a backup application is data movement, the method the application will use to get data from point A to point B. Before network backup, each server had a backup application running on it that copied data to a locally attached tape drive or hard drive, known as direct-attached storage (DAS). The approach of having enterprise backup software and backup hardware for every server was both expensive and time-intensive.
The shortcomings of local backup led to the evolution of network backup, which created a single, centralized backup server. After backing up the data from primary storage, the backup server would then copy that information to a locally attached disk or tape device. While network backup was less costly, it was more time consuming since all data is copied across the network. The advent of higher bandwidth networks has alleviated most of these concerns.
Vendors have developed several technologies that improve network backup performance. Many vendors supply agents that reside on primary storage servers. Those agents push data to the backup server instead of the backup server having to pull the data. The result is that more data can be sent to the backup server at the same time. Many applications, OSes and environments also have APIs built into them that will send backup data to the backup server without requiring installation of a separate agent.
Another development is deduplication, which reduces the amount of data that needs to be sent across the network by not sending data the backup server has already stored. Software on the primary storage server compares the data it is about to send with the data that is on the backup server. If that data already exists, an additional pointer or reference to that data is established, but the actual data is not sent. The result is a dramatic reduction in data transfer, as much as 80% or more.
Finally, for situations where push technology and data reduction technologies aren't enough, most enterprise data backup software has the ability to deploy an intelligent agent that can back up to a local backup device instead of sending data across the network. The key difference between this technique and the DAS technique described earlier is that scheduling and data tracking are handled by the primary backup server.
The combination of these optimizations plus the superior economics of network-based protection have made it the default method for protecting data centers.
Once data arrives at the backup server, it has to be managed. A significant component of enterprise data backup software is essentially a database that tracks all the data it has stored. The database not only has to track every copy of every version of each piece of data it has received, it must record the location of all those copies across a variety of media, including disk, tape and the cloud. The database may contain additional information, such as which user created the file and which user last modified the file. Depending on the frequency of backups and the rate of file changes, this database can grow to a massive scale.
Users will employ this database to find data when it is needed for restoration. Vendors will often allow various types of search attributes like partial file name match, date change and file owner. Backup software vendors can invest significant amounts of time and money to ensure search queries return results quickly and accurately.
Another important capability of data management is version control and management. The software should allow IT professionals to specify the number of versions of backup jobs and their retention times.
A key consideration for IT buyers is how well these databases will perform under duress. For example, if file counts are high and backups are frequent, IT administrators should ensure the backup application database can scale to meet those demands.
At a minimum, job management includes scheduling the backups of the various servers. IT administrators should look for backup applications that can help organize these jobs. For example, some applications can group servers by category or type: A data center with 50 servers requiring similar types of data protection can be backed up with one job. Newer applications can even measure the various bandwidth requirements of scheduled jobs and then reorder them to ensure optimal use of resources.
Enterprise data backup software serves as the foundation of any data protection process. It should provide a broad baseline of protection for a variety of servers. But even though the category is very mature, there are specific differences between applications. IT professionals need to understand their requirements, not only in terms of performance, but in the amount of data to protect, the number of files represented by that data and the amount of servers to protect.
Enhance backup software performance
Backup deduplication guide shows how best to use the technology
Backup software for VMs can offer better protection for virtual servers