Data backup technologies today provide avenues to merge with primary storage, as well as with disaster recover...
We've seen yesterday's RAID arrays repurposed as bulk secondary storage, with SSDs or all-flash arrays taking over the role of enterprise hard drives. In this process, SAS drives have been supplanted by high-capacity SATA drives, but this is seen as an interim step with either inexpensive, high-capacity SSDs replacing the hard drives altogether or the migration of secondary storage to the cloud as an alternative that makes hardware someone else's problem and avoids capital expenditures.
All of this platform migration has to be viewed in the context of modern backup software. The fact that public clouds provide multiple geographically dispersed creates the opportunity to merge disaster recovery (DR) with data backup technologies. The rise of data replication as a way to maintain high uptimes, as well as solid data integrity, makes us look beyond continuous backup systems with short recovery point objectives (RPOs) to snapshot systems.
Spot the differences in backup and recovery procedures
Backup to the cloud is now commonplace, and most enterprises are on board with the idea. On the flip side, this is a feature of all the viable backup software suites now marketed, leaving software vendors struggling for differentiation. Adding DR protection is an obvious step, but there are differences in the intent of data backup technologies and DR that need to be effectively addressed.
Adding a continuous backup mode to backup software is a powerful step forward. Incremental changes to data are logged to the backup system at very short intervals, typically a few minutes, keeping the backup close to current data levels. This makes system RPOs very small.
Continuous data protection is a good idea for organizations with tight RPOs.
A good DR strategy aims at minimizing both recovery time objective (RTO) and RPO, while backup is aimed at protecting against ransomware, failures of primary storage and user error. While some were initially optimistic that replication between private and public clouds in the hybrid cloud model would offer both protection modes, that optimism faded upon the realization that ransomware could sink all replicas of data and that performance was slow.
A more robust alternative surfaced: using replication methods in a perpetual storage model. Here, data is never deleted, allowing an administrator or app to roll back any object to any version level. Ransomware and operator errors are completely mitigated -- though, there is a potential performance hit as a result. Perpetual storage systems obviously use more storage than a single current version, but in the bigger picture, the minimum need for just a single replica of the snapshot uses much less capacity than all those weekly and monthly backups.
There is a difference in who supplies continuous backup versus snapshotting and perpetual storage. The is the domain of the traditional backup software suite, while snapshots have come from OSes and storage appliance vendors. This has created some debate about which is the better solution. When we look at recovery from ransomware and user error, snapshots have an edge. There is no need to translate data, instead just rolling back to a defined recovery point. When recovering in the cloud, the same is true. Spinning up cloud instances is one potent recovery mechanism if the primary data center is offline; if the snapshot is already available in the cloud of choice, this process can take just minutes.
Even with snapshots being ahead by a nose, there are still advantages in having a distinct backup/DR tool. This brings us to the latest evolution of data backup technologies. With backup today viewed as a possibly virtualized service, the reverse thought process is that backed-up data is a viable resource for analytical tasks and such, aimed at mining residual value from the content. After all, as we migrate to solid-state storage products, the available storage bandwidth far exceeds the needs for actually doing the backup and snapshot itself. This leaves plenty of bandwidth for value-added mining of data.
Copy data management is key to reducing the amount of storage space.
A couple of data backup technologies offer some internal analytical capabilities already. With copy data management (CDM), the aim is to mine residual value from the backup data. These tools are typically aimed at the storage pool and its optimization, with features like global deduplication, but importantly, they present the storage to the outside world without compromising backup integrity.
What do vendors have to offer?
CDM approaches make the backup data accessible to big data tools, for example, so that the data is no longer cold. Veritas has ambitious plans to add data visualization and machine learning (from Google) to NetBackup. Dell aims to automate storage administrator tasks as the industry has fewer personnel in those roles. These two major players, together with Commvault and IBM, all offer products for cloud and multi-cloud migration, while Veeam is linking up with Oracle and others on applying big data, AI and machine learning techniques to their data backups. CDM backups also have a place in DevOps, where a stagnant pool of real data can be a safe way to test new codes and processes. Remember, though, not to overwrite the backups or add data to them.
Other smaller players are pushing into the data management space with secondary storage technologies. There's CDM pioneer Actifio, with a current-generation code base and the agility to keep up with innovations in software outside backup. Rubrik has a different tack, with an metadata tagging system that can add a great deal of efficiency to a multi-cloud environment, including CDM. The agility of Veeam and Actifio will become important as the major cloud service providers -- Microsoft, Google and AWS -- all migrate or partner across from the public to the private cloud space.
As we move through 2018 and 2019, we will see secondary storage evolve into SSDs. For storage overall, this tends to blur the primary and secondary storage boundary, especially when very large drives come not only with a low price per terabyte but have nonvolatile express (NVMe) interfaces and high performance. The debate now becomes local versus cloud, and the merging of DR with backup and snapshotting suggests a hybrid or all-cloud platform for the cluster's persistent storage pool. Extending NVMe over Ethernet interfaces for storage across the WAN and into the public clouds will be standard practice to increase performance substantially.
For the systems administrator, the task is to unify data backup technologies, DR, snapshots and data mining into a cohesive mainstream strategy. None of this can work in a vacuum, however, since the key to low RTOs is to be able to create replacement instances in a cloud when an outage occurs and to perform data recovery of corrupted objects quickly, if not automatically. The tools still have some evolution ahead, but there are plenty now available to begin the journey.