Published: 10 Jan 2007
It's best to store remote-office data and host remote-office applications in the central data center. Various products can eliminate TCP/IP transmission issues that might hinder those moves.
Several recent surveys by analyst firms document the perception that the amount of data within remote offices and branch offices (ROBOs) is greater than the amount of data within the data center. In an era of growing compliance regulation, and increased internal and external threats to data, managing and protecting ROBO data has become a big problem.
ROBO users require and expect the same level of services and support as users in the primary data center. This means ROBO data accessibility, data protection and application response times must be equivalent to those local to the data center. The requirements are simple. Delivering on those requirements isn't.
Storage managers have two main ways to protect ROBO data: create a mini data center for each ROBO, or centralize ROBO applications and data at the primary data center. The first option is very expensive, management is complex, highly qualified IT staffers are required at each ROBO and things rarely work well.
Centralizing ROBO applications and data in the primary data center offers the following potential advantages:
- Increased IT productivity by leveraging IT administrator assets within the data center
- Reduced investments in hardware, infrastructure, software, personnel, training and maintenance, and increased data center asset utilization
- Better, faster service and simpler management
- Consistent data protection that meets IT and regulatory compliance requirements
- Easier sharing of data among ROBOs
Three technologies enable the centralization of data by mitigating or eliminating most WAN problems (with the exception of speed of light latency or distance). These include:
- Fat pipe TCP WAN optimization (a.k.a. Data Replication Optimization or DRO)
- Skinny pipe TCP WAN optimization, plus Wide-Area File Systems or WAFS (a.k.a. Wide-Area Data Services [WADS], Wide-Area Application Services [WAAS] and Wide-Area Data Management [WADM])
- Distributed ROBO backup to disk
Fat pipe TCP WAN optimization
Fat pipe TCP WAN optimization (DRO) is designed to move large amounts of data in single or multiple streams over "fat pipes" at a speed of between 5Mb/sec to 1Gb/sec. DRO minimizes the effect of distance by using a TCP protocol-enhancing proxy (typically a derivative of TCP or UDP) that allows it to do things that can't be done with standard TCP/IP v4 or v6. For example, the protocol-enhancing proxy allows DRO to terminate TCP at each end of the pipe, which completely eliminates TCP latency overhead. That, by itself, would greatly increase the effective data throughput, but there are other DRO tricks that increase throughput.
DRO deduplicates data on a block level, which requires a lot less data to traverse the fat pipe. It can also compress the deduplicated data, as well as adjust the window size and payload to decrease the number of roundtrips required to move that data between the primary data center and remote or branch office. (When the TCP window size and payloads are increased, it usually makes the effective data throughput far more sensitive to packet loss caused by high bit-error rates, network jitter and network congestion.) The net effect is a greatly increased data throughput rate.
The real throughput magic comes from DRO's capabilities in diminishing the throughput effects of packet loss. WAN packet loss is a fact of life. Most telcos will claim incredibly low packet-loss numbers of 0% to .001%, but those seem to be the exception, not the rule. Users, on the other hand, say their WAN packet-loss rate ranges between .1% to 6%, according to various surveys. Although that may not seem like much, it's devastating to TCP/IP throughput.
TCP was created with very short LAN distances in mind. When packets are lost or dropped, TCP can't reorder packets, so it must retransmit every packet behind the lost one. This isn't such a big deal on a LAN because the distances are short, acknowledgement of packet loss is quick and not much data has to be retransmitted when packets are lost. That isn't the case when traversing a WAN; as distance increases, roundtrip time increases and more packets fill the pipe. This means more packets must be retransmitted when a packet is lost.
DRO significantly reduces the effects of packet loss by reordering the packets, so it only has to retransmit the lost packets and nothing else. DRO data throughput is often an order of magnitude greater than native throughput. Sometimes it's actually greater than the rated bandwidth (depending on the amount of bandwidth). These quantifiable throughput improvements can be traced directly to the cumulative effects of deduplication, compression, optimal window sizing, stacking, TCP termination and packet-loss mitigation.
Sometimes DRO throughput can be too efficient. It can completely seize the entire fat pipe, making it available only for those applications running through the DRO, which can block other apps from accessing the WAN. One way around this is to ensure the DRO product has rate-limiting capabilities or assigning only a fraction of the WAN bandwidth to the DRO (see "Data Replication Optimization [DRO] product comparison").
DRO is an excellent choice when bandwidth is high, packet loss exceeds .1%, distances are greater than 300 kilometers and when data migration applications have to move massive amounts of data across the WAN. DRO isn't normally a good choice when the bandwidth to ROBOs is relatively "skinny" (typically 1.5 Mb/sec or less), there's no quantifiable packet loss and the distances are very short. DRO also isn't a good choice if the amount of data moved over the WAN is light or insignificant, such as Web applications.
Skinny pipe TCP WAN optimization plus WAFS
Skinny pipe TCP WAN optimization and WAFS are often features and functions of the same WAN optimization controller. Skinny pipes are defined as bandwidth less than 5Mb/sec; typically, they have a bandwidth of 1.5Mb/sec (T1) or less. Skinny pipe TCP WAN optimization is functionally similar to DRO; the key difference--and it's not a trivial one--is the size of the bandwidth pipe. Algorithms and techniques that work with fat pipes don't work nearly as well with skinny pipes and vice versa. Skinny pipe TCP WAN optimization has effective data throughput results similar to DRO.
Vendor methods vary, with widely differing results depending on the data. In general, vendors use some combination of deduplication, compression, sequence caching, TCP and UDP acceleration, bandwidth management, multithreading, quality of service (QoS) and path optimization.
WAFS goes a step further by providing acceleration for specific applications such as CAD/CAM, print, Web caching, e-mail, DBMS or enterprise resource planning. The combination of skinny pipe WAN optimization plus WAFS is an outstanding way to centralize ROBO data. One way WAFS radically improves the performance of some of these apps is by reducing the application protocol's "chattiness."
The Common Internet File System (CIFS) is an excellent example of a very chatty application protocol that requires numerous commands for every transaction. Each command creates a roundtrip from the initiator to the target and back, with each roundtrip adding latency (delay) to the transaction. Latency increases exponentially as distance increases because of the cumulative effect of all the roundtrips. WAFS turns around the CIFS commands locally and stacks them so that most or all of the commands required in the transaction traverse the pipe at the same time. ROBO performance increases by "orders of magnitude," allowing many ROBO applications and their associated storage to be centralized and consolidated in the primary data center.
Skinny pipe TCP WAN optimization plus WAFS is an excellent choice when there are quite a few ROBOs, the bandwidth to the ROBOs is less than 5Mb/sec and performance comparable to that of locally hosted applications for ROBO users is important. Skinny pipe TCP WAN optimization plus WAFS isn't a good choice when the amount of data moved between locations is very large or for laptop mobile users.
Distributed ROBO backup to disk
Distributed ROBO backup to disk is the first data protection technology designed specifically for ROBOs. It has been around for nearly 20 years, but the technology has only recently been available for license by end users. Distributed ROBO backup to disk protects data for a wide variety of operating systems in servers, desktops and laptops that range in size from a single user to hundreds of users. To do that effectively, these backup products or data services need to be more efficient than traditional backup or replication technologies, and easier to use. Most of the products provide multiple point-in-time data versions similar to continuous data protection (CDP) products.
The ROBO WAN efficiency comes from deduplication (locally and globally), transmission of delta changes and compression of the remaining data. Distributed ROBO backup-to-disk data is deduped across the WAN in flight and at rest. DRO, skinny pipe and WAFS data is only deduped in flight. In addition, local deduplication takes place at the ROBO location, while global deduplication takes place at the central data center site. Global deduplication removes the duplicates among all of the ROBO sites.
The significantly reduced protected data will usually make disk storage less expensive than backing up to tape. In fact, there's no need for tape, tape libraries or virtual tape. Recoveries to the ROBO are also a great deal faster since the data comes directly from disk. For archiving, backups can be moved to tape or optical media as the protected data ages.
Distributed ROBO backup to disk is available from the following vendors: Asigra Inc., EMC Corp./Avamar Technologies, EVault Inc., Iron Mountain Inc., Signiant Inc. and Symantec Corp. (see "Distributed ROBO backup-to-disk product comparison"). Distributed ROBO backup to disk is an excellent choice when data protection is the primary issue or there are numerous mobile users. It's not a good choice when the primary ROBO issue is performance when working with centralized data center applications.
Centralizing ROBO applications and their data at the primary data center has enormous economies of scale. To do so effectively also requires implementing technologies that eliminate TCP WAN transmission issues. When an organization contemplates how it will centralize distributed ROBO data, it must first determine which applications will be centralized and the application's required performance. It's possible, and in many cases highly likely, that multiple technologies should be deployed to meet the organizations needs.