News Stay informed about the latest enterprise technology news and product updates.

EMC Centera scalability hampers large e-mail archives

Centera's limited object count forces e-mail applications to "containerize" groups of e-mail that can be treated as a single object, but this has its downsides.

Rumors of poor performance, particularly when it comes to archiving e-mail, have dogged EMC's Centera object-based storage system since it was introduced almost three years ago. However, it now seems that a more nefarious problem plaguing Centera is the relatively small number of objects it can store.

Centera can house approximately 400 million objects per 32-node rack. At first glance, that may seem like a big number, but not when one considers the volumes of e-mails generated by companies today.

Related content

E-mail archive applications combat storage woes


EMC dodges question on Centera performance


Security flaw could put EMC Centera users at risk


How to ease into archiving through backup


EMC opens doors wide for Centera

According to statistics from the Radicati Group, corporate users send and receive an average of 73 non-spam e-mails per day. With a retention policy applied to it, a single e-mail consists of at least two objects -- more if there are attachments. At two objects per e-mail, 400 million objects would be consumed in a year by a firm of just 7,500 employees.

EMC argues that multiple Centeras can be clustered together to increase the number of objects it supports. According to Roy Sanford, EMC's vice president of content address storage, the company has tested Centera configurations of up to eight 32-node racks.

Nevertheless, sources close to EMC say that when using Centera for e-mail archiving, users tend to run out of objects much sooner than they run out of capacity. EMC is rumored to be developing a four-node Centera specially designed for e-mail. While it would feature much less usable capacity, it will support the same amount of objects.

Today, to get around this low object count issue for e-mail, some e-mail archiving application vendors perform a trick called "containerization," whereby e-mails are bundled into a single large file. When turned on, containerization allows users to achieve faster write speeds than can be achieved by archiving individual e-mails. Furthermore, a single container consumes fewer objects than thousands of individual e-mails.

But containerization has its downsides. For example, it may make it very difficult to apply data retention policies on individual e-mails. It will also slow down search times because "you're doing a hash for a whole bunch of things instead of just one little one," said Mary Kay Roberto, senior vice president for KVS Enterprise Vault, now Veritas Enterprise Vault.

EMC addresses performance

Meanwhile, EMC has been working hard to dispel rumors that Centera read and write speeds per se aren't sufficient for e-mail archiving needs. According to a recent performance test certified by Evaluator Group, a 16-node Centera system was able to archive upwards of 240,000 e-mails per hour running Veritas Enterprise Vault.

"What we found was that the performance of KVS is almost always dictated by how big of a server you are running it on," said Randy Kerns, Evaluator Group senior partner. As an example, 240,000 e-mails per hour is equivalent to 5.76 million e-mails per day, and about 67 e-mails per second. Referring back to the statistics from Radicati Group that corporate users send and receive an average of 73 non-spam e-mails per day, a single Centera could archive the daily e-mail generated by a firm of just under 80,000 employees.

That should seemingly satisfy the needs of most large organizations. Veritas' larger Enterprise Vault customers, for example, are archiving between 500,000 to one million e-mails per day, according to Roberto.

Those numbers are a huge improvement over earlier Centera generations. According to internal EMC documents obtained by, the Centera that was shipping in 2002 could write 10KB objects at the rate of 18,450 per hour, or about 5 objects per second. At that rate, Centera would have been able to support the daily needs of a 6,000 person firm.

EMC is currently shipping a third-generation Centera. Since Centera first shipped, the performance has improved between 5 and 10 X, according to Sanford, thanks to things such as improved store and management procedures and faster hardware.

It's worth noting that tests of Centera's performance were achieved without using Enterprise Vault's containerization feature, whereby e-mails are bundled by groups of a thousand into a single large file.

Dig Deeper on Backup and recovery software

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.