This column picks up a few threads from last month's installment on disk-to-disk backup.
If you are responsible for data storage administration, you know that it is a challenge to look across the expanding range of data protection solutions and create an orderly spectrum of alternatives for your data protection scheme. This is also a problem confronting the Enhanced Backup Solutions Initiative (EBSI), on whose board of directors I play the (non-compensated) role of "consumer ombudsman."
Currently, EBSI is working to gain sufficient vendor membership to fund its core mission: to identify and certify data protection reference models that may mix and match hardware and software technologies of different member and non-member companies. A stumbling block is the use of the term "backup" in the name of the organization. Some vendors are reluctant to join the initiative because their products offer mirroring solutions or other approaches to the data protection problem that are, in their words, "the anathema of tape backup." Moreover, they do not want to confuse their value proposition by having their product offerings included on the same spectrum as tape backup offerings.
The fact is that all data protection schemes, whether disk- or tape-based, flow from the central concept of a backup. Not backup as a DOS command, interpreted as writing a copy of data to tape, but backup in the bigger sense of the English language, meaning redundancy. Data is irreplaceable and must be made redundant for its own protection.
In fact, whether the vendor marketing types like it or not, all data protection schemes involve copying. A copy of a primary dataset must be made as a protection against the loss of (or loss of access to) the primary. Differences in the approaches to realizing this goal usually have to do with the media targeted for the copy (tape or secondary disk), the parsing of the operation involved (full volume, incremental, changed blocks, metadata, etc.) and the copy "deltas" (whether the copy and the primary are synchronous or asynchronous). Speeds and feeds of the solutions are differentiators, providing additional criteria for sorting the various strategies, as is the price tag of each approach. From this perspective, the spectrum line begins to take shape between the extremes of full volume tape backup and synchronous mirroring.
In current marketecture, full volume tape backup is the whipping boy. It is slow (at 1 terabyte per hour at optimal speed), cumbersome (it involves lots of cartridges and management), labor intensive (more tapes, more monks) and untrustworthy (tapes may fail or cannot be read by any drive other than the one used to create them). More than anything, tape, the Valley Girls tell us, is "so day before yesterday," so 20th Century.
By contrast, disk-to-disk (D2D) is fast, slick and automatic. It responds to the 21st Century need for speed in terms of business continuance. When array number one goes down, storage traffic simply fails over or switches to array number two. No muss, no fuss.
In fact, the major complaint about D2D -- it's expense -- has all but evaporated, vendors say, with the advent of increasingly resilient and decreasingly expensive ATA and Serial ATA arrays. Further enhancing the D2D strategy are new algorithms that factor out replicated data on multiple primary disk platforms, copying only one occurrence of the data and thereby dramatically reducing the size of the dataset that needs protection. Another technology maintains a metadata repository of the data that is mirrored, establishing a "time machine" that enables you to dial back to the last moment before a logic bomb or software bug wrecked your mirror, and to return data to a useable form.
Still another D2D approach, advanced by Tacit Networks and others, uses some interesting technology to replicate data in caching appliances at multiple locations within the enterprise network. Gone is the concept of mirroring: the world is one big synchronized cache that affords fault tolerance to the entire enterprise environment.
These techniques are slowly coming to market and live somewhere on the data protection spectrum between the end points of traditional tape and traditional disk mirroring. Soon, vendors say, you won't need to buy a tape library for anything but long-term archive.
As for the mirroring solutions of today -- those that require eight redundant disk sets within the array, then two or more additional identically-configured arrays for local/synchronous and remote/asynchronous multi-hop mirroring -- vendors claim they are rapidly becoming dinosaurs as smarter technology makes data copy and propagation more intelligent. Enabled by an increasingly networked infrastructure, and increasingly commoditized disk, the veil of pain will soon be raised from the brow of the storage administrator.
No deus ex machina
As wistful and idyllic as this all sounds, there will need to be a lot of testing and validation before I believe any of it.
I won't take at face value claims of a disruptive technology waiting in the wings to fix the problem of data protection. From where I am sitting, there is no one-size-fits-all solution, no deus ex machina that will descend onto the storage stage suddenly and unexpectedly.
For one thing, it will need to be demonstrated to me that ATA or Serial ATA arrays can, in addition to serving as targets for D2D data copy, support the demands of production systems that are redirected to the data copies they store if and when a primary Fibre Channel/SCSI array fails. Despite the hype about ATA/SATA, high-end disk is made of sterner stuff than ATA...and for good reason. Question is, can these arrays carry the load in a recovery setting for however long they must be put into that role?
Another issue vendors will need to address before prying the tape cartridge from my hand: prove to me that I can't do what you propose just as effectively and with less retooling simply by changing my tape methodology. If I apply the logic of the D2D advocates, why can't I buy better performance from tape solutions?
Finally, I want to know what enabling technology is required to make D2D strategies work. The hidden cost of many D2D solutions is the FC fabric that you need to install to enable it. The cost for putting one of those beasties in place, labor required to keep it running, variables that must be considered and the frequency of fabric outages might add up to a data protection cure that hurts worse than the disease. Show me that you have all of the problems worked out and that there are no issues with firmware, software, HBAs or proprietary array controllers -- and do it all before I buy from you.
The proof of these strategies is, as the old saw says, in the pudding. One good way that vendors can start to put the value of their new data protection approaches to the user community is by subjecting their data protection schemes to EBSI for classification and measurement.
About the author:
Jon William Toigo has authored hundreds of articles on storage and technology along with his monthly SearchStorage.com "Toigo's Take on Storage" expert column and backup/recovery feature. He is also a frequent site contributor on the subjects of storage management, disaster recovery and enterprise storage. Toigo has authored a number of storage books, including "Disaster recovery planning: Preparing for the unthinkable, 3/e".