peshkova - Fotolia
Cohesity Inc. emerged from stealth mode this week with an early-access version of an appliance that aims to converge data protection, archiving and analytics onto a single scale-out platform and eliminate the need for multiple secondary storage products.
Convergence is a familiar theme for the Santa Clara, Calif.-based startup's CEO and founder, Mohit Aron. He was a founder and the CTO of Nutanix, which caught fire with hyper-convergence products that integrate compute, storage, network and virtualization resources in a commodity hardware box.
Aron, who left Nutanix in 2013, said he and his Cohesity team "are now working on bringing the benefits of convergence to the secondary storage platform, but rather than hardware convergence, this is more about convergence of software workflows and, in particular, secondary storage workflows."
Aron said organizations often have data sitting in backups and archives with no way to gain insight on it unless they move it. When they do move it, they wind up with multiple copies floating around the data center.
He said for data protection alone, a user might have backup software, backup hardware, archival or tape products and cloud storage. Because the backup storage does not scale out, organizations need to buy a second or larger system when they hit the capacity limit, he added.
"What differentiates us is we know how to build really, really large systems," Aron said.
Aron honed his large-scale distributed system chops when he helped to build the file system used by Google. He said more than 25% of Cohesity's technical team consists of Google alumni and others hailing from companies such as VMware, Riverbed and Netflix.
Cohesity started in 2013 and that year raised $15 million Series A funding led by Sequoia Capital and Wing Venture Capital. Last month, the company closed its Series B funding round of $55 million led by Artis Ventures and Qualcomm Inc., with additional investments from Sequoia, Wing, Accel Partners, Battery Ventures, Google Ventures and Trinity Ventures.
The startup made the Cohesity Data Platform available in limited fashion two months ago for pilot testers, according to Aron. He expects it to become generally available by start of the fourth quarter.
The early-access product does not have the full list of capabilities that Aron eventually envisions. For instance, the current version supports distributed NFS. RESTful APIs for object-based storage are due in the GA timeframe, and support for the Hadoop Distributed File System (HDFS), iSCSI and SMB will follow as part of the extended roadmap, according to Aron.
George Crump, founder and president of Storage Switzerland LLC, pointed out that users will initially be able to back up VMware natively, and they'll gain similar capabilities later in the year for Oracle databases, with more to follow.
Crump said no "rank-and-file" vendors are doing what Cohesity is attempting with a product combining backup, archive, basic NAS-like home directories, replication and big data analytics. He said Cohesity reminds him of another startup, Rubrik, with its cloud-like scale-out architecture. Rubrik is in early access with a backup/data management appliance.
Perhaps the closest established vendor to Cohesity is Actifio, which sells a copy data management platform handling data protection, mobility and insights, and business continuity.
"[Cohesity's] advantage is this is a unique way to handle a decades old problem -- what to do with secondary data," said Crump. "The challenge is going to be: This is a very entrenched marketplace, and will customers switch?"
Cohesity Data Platform architecture
The Cohesity Data Platform features the SnapFS distributed file system, which is object-based and provides "Google-like scalability," according to Aron. He said the file system dynamically adds nodes, provides fault tolerance and allows for non-disruptive operation. Cohesity built SnapFS from scratch with features such as global deduplication across all nodes and efficient cloning and snapshot technology.
Above the SnapFS file system is the data management layer supplying data protection, remote replication, disaster recovery, archival storage, cloud integration and analytics. The pilot features built-in analytics with indexed backups, search capabilities and reports. In the GA timeframe, users will gain the ability to inject code and run custom analytics, and later have access to native HDFS, according to Aron.
On top of data management, an application integration layer will allow for connections to VMware environments and ultimately expose the SnapFS file system through multiple protocols, such as distributed NFS, distributed SMB, RESTful APIs and HDFS.
The Cohesity Data Platform will be sold on an Intel-based 2U appliance with four independent server nodes, 6.2 TB of solid-state drives (SSDs) and up to 96 TB (24 TB per node) of hard-disk drives (HDDs). Cohesity qualifies the hardware, which is embedded with its software. The minimum configuration is three nodes and users can add more in one-node increments through a pay-as-you-grow model, according to Aron.
Arun Taneja, consulting analyst at Taneja Group, said data protection was "probably one of the most broken things in the IT infrastructure" until a "smattering of technologies" such as data deduplication and improved snapshots started to enter the picture. He said, "The time is right to do convergence."
"Something like this will put huge pressure on existing data protection companies that are doing things piecemeal," he said. "But it will take five years for this whole phenomenon to gather steam, for all the implementations to happen and for customers to genuinely move from existing infrastructures to this type of infrastructure."
Chris Evans, director of London-based IT consultancy Langton Blue, said Cohesity may be sending a "confused message" by talking about backup, archive and NAS for test and development in the same product.
"From a marketing perspective, that's going to make it a bit difficult to pitch it," he said. "They have to be careful to make sure that people don't get confused as to what the platform actually does and what it's for."
Evans said the first use case for the Cohesity product will likely be as a single backup platform that scales out. Those with hundreds of terabytes or even petabytes of data would benefit the most from the scale-out system, he said.
"There are a lot of operational headaches this sort of thing will fix in backup," Evans said. "The main advantage is you've got a highly searchable, indexable system for looking at data."
Backup and archive as a data protection strategy
Merging backup and archive depends largely on archive size