Home > Data Backup Learning Guides > Data Backup > Data deduplication > Related information > CAS and data deduplication: Partners in archiving
Learning Guides: Data Backup:
EMAIL THIS
 START   ARCHIVES   SECURITY   SOFTWARE   CDP   DEDUPLICATION   DISK BACKUP   REMOTE BACKUP   TAPE   FINAL EXAM   COPIES   
Data deduplication


Related information
<< PREVIOUS | NEXT >>: How to estimate your data deduplication ratio
 TIPS & NEWSLETTERS TOPICS 

BACKUP AND RECOVERY

CAS and data deduplication: Partners in archiving


Rick Cook
05.08.2007
Rating: --- (out of 5)


Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us   


What you will learn: This tip explores CAS and data deduplication, discusses the uses for each and outlines the technologies' strengths and weaknesses.

Archiving is on everyone's mind right now in the storage world. Compliance demands are pushing users to implement some kind of archiving product, and there's been a lot of interest in finding new ways to deal with the increased amounts of data that companies need to save. Data reduction looks like a promising approach, and content-addressed storage (CAS) and data deduplication have emerged to cope with sprawling data growth. Although they are sometimes confused, CAS and data deduplication are not at all the same thing. While they are both used in archiving data, CAS may or may not include data deduplication, as it is commonly understood, to reduce the amount of data stored.

Data deduplication information
Data deduplication explained 

In-band vs. out-of-band deduplication 

Compression, deduplication and encryption: What's the difference?

Special Report: Data Deduplication
While CAS is a distinct class of products, deduplication isn't a product at all. It's a feature that is found in many kinds of products other than CAS. Many document management applications, especially for email, such as Mimosa System Inc.'s NearPoint archival software for Microsoft Exchange, use it. So do many non-CAS applications and hardware, such as some virtual tape libraries (VTLs), for instance FalconStor, and remote backup software from companies, like Asigra Inc. and others.

Data deduplication examines the data to be saved at the block level looking for duplicate blocks. When it finds a duplicate it replaces it with a reference that points to the original copy of the block. How much space this saves depends on the nature of the data being stored. In some cases, such as email, the savings can run to 20 to 1 or more.

One of the major sources of skepticism about data deduplication is overhead. Obviously, it takes both time and computing power to examine every block of data to be stored and compare it with every block of data currently in storage. Makers of products incorporating data deduplication have spent a lot of time and effort speeding up the process. At the most basic level, most of them use hashing to identify each unique block, and many of them use much more sophisticated schemes. As a result, the throughput of backup and archiving systems using data deduplication has been climbing. Diligent Technologies Corp. recently claimed one of its customers achieved 400 MBps throughput using the latest version of the company's ProtecTier disk-based backup product.

CAS is a much broader concept that data deduplication. As the term is used today, it refers to systems that locate items by unique identifiers based on the content itself rather than its location in storage.

When an object, such as a document, is stored in a CAS system, its content is scanned and an identifier, such as a hash value, is generated. This identifier is then used to retrieve the object as needed. Since two identical objects, such as a duplicate of the same document, will generate the same identifier and only one copy will be stored. This is one of the sources of confusion between the terms. Single instancing is not nearly as efficient at saving storage space as block-level data deduplication, and when most people talk about data deduplication they mean block-level data deduplication.

One of the major attractions of CAS is that because each object's identifier is based on its content, it is easy to verify that the retrieved object hasn't been changed since it was stored. This makes CAS very attractive for compliance-related storage.

Of course, that also means that any change to an object stored in a CAS system creates a new that is stored separately. This is one of the reasons CAS is best suited to data which will not change once it is saved. The other reason is overhead. Storing an object in a CAS system requires significantly more time and computing power than storing it in a conventional file system. Retrieval is much less affected.

Even more than data deduplication, CAS is currently a hot backup and archiving technology. CAS systems are available from more than a dozen vendors ranging from very large, such as EMC (Centera) and Hewlett-Packard Co. (HP) (StorageWorks), to small, such as PermaBit Inc. with its Dynamic Information Services appliance.

Again, even more than data deduplication, CAS systems vary enormously in approach, architecture, capacity, throughput and price. Storage administrators who are considering CAS need to perform a thorough review of their needs and carefully research the available products to find the best match for their enterprise.

About the author: Rick Cook specializes in writing about issues related to storage and storage management.

Rate this Tip
To rate tips, you must be a member of SearchDataBackup.com.
Register now to start rating these tips. Log in if you are already a member.




Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us   


<< PREVIOUS | NEXT >>: How to estimate your data deduplication ratio
VIEW ALL IN THIS CATEGORY

RELATED CONTENT
Backup and recovery
Problems with backing up and restoring Exchange Server data, Part 2
Ten things you should ask a vendor before buying a tape library
Encryption's impact on network backup can be high
Bare-metal backup and restore options
Backup and recovery basics: Testing your backups
Data protection for financial organizations
The pros and cons of file-level vs. block-level data deduplication
Five signs you need to replace your data backup software
Data backup options for remote sites
The differences between block-based and file-based data backup

Related information
How archive and encryption impact backup with Curtis Preston
Best practices for long-term tape archives
Archive or backup?
Choosing an email archiving strategy
Hosted vs. in-house e-mail archiving
Archiving unstructured data

Related information
How to recover deduplicated data faster
How to estimate your data deduplication ratio
In-band vs. out-of-band deduplication
Compression, deduplication and encryption: What's the difference?

RELATED RESOURCES
2020software.com, trial software downloads for accounting software, ERP software, CRM software and business software systems
Search Bitpipe.com for the latest white papers and business webcasts
Whatis.com, the online computer dictionary

DISCLAIMER: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.



Enterprise Backup Solutions - Continuous Data Protection (CDP)
About Us  |  Contact Us  |  For Advertisers  |  For Business Partners  |  Site Index  |  RSS
SEARCH 
TechTarget provides enterprise IT professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective IT purchase decisions and managing their organizations' IT projects - with its network of technology-specific Web sites, events and magazines.

TechTarget Corporate Web Site  |  Media Kits  |  Reprints  |  Site Map




All Rights Reserved, Copyright 2008, TechTarget | Read our Privacy Policy
  TechTarget - The IT Media ROI Experts