cecs - Fotolia
Faced with 50,000 individual users generating petabytes of data per month and 1.8 million files in storage, the Texas Advanced Computing Center needed a way to strictly allocate archive space to users. It also needed a system it could find talent to support.
A transition from an old Oracle system to a DDN SFA14K DCR block storage system from DataDirect Networks and a Scalar i6000 tape library from Quantum, along with instituting new retention policies, has helped the center control the growth of data its users were uploading and remove the roadblocks of a proprietary tape format.
"Quantum has very easy-to-manage and rigorous 'quotaing,'" said Frank Douma, senior systems administrator for Ranch at TACC. "We are inflicting discipline on the user community so they don't just blindly bring everything in."
Data, data everywhere -- a lot of it
TACC, which is funded by the National Science Foundation, provides computing resources for scientific researchers. The vast majority of TACC's clientele is the university community, and major research areas include biomedical, high-energy physics and long-term climate prediction modeling. A small percentage is government research, but TACC specifically does not do Department of Defense work.
TACC's supercomputers generate the researchers' data, but the company's archive system, called Ranch, is where the data is stored. For 12 years, Ranch had been on an Oracle Hierarchical Storage Manager environment, but there were concerns over the vendor abandoning its proprietary T10000D tape format in favor of LTO, an open format. Oracle was also staying with the Solaris operating system, which was becoming increasingly harder to support from an IT talent standpoint, according to Douma.
Frank DoumaSenior systems administrator for Ranch, TACC
"We just had an aging product that seemed to be about to die on the vine with the vendor," Douma said.
Adding to the challenge was the size of the user base. The environment is so large that one of its biggest problems was judiciously divvying it up among its users. Previously, there was an indefinite retention period for the data uploaded to Ranch, which Junseong Heo, senior systems administrator and manager at Large Scale Systems, said is unsustainable.
"We are ingesting roughly 2 petabytes a month, so data growth is truly semi-exponential," Heo said. "If someone needed to store 10 petabytes, we can't allow a single organization to get a quarter of the capacity of the entire system when we're serving 50,000 users."
Douma added, "It's the largest environment I've ever worked in."
Quantum tape's quota control helped with this. It allowed Ranch to provide strict allocations to its clients and let them see how much of their data they are using. Douma described it as a way to remove the "limitless storage" mindset of the user community and prevent them from uploading more than they're allowed.
TACC first purchased Quantum tape in spring 2018, but the organization didn't start using it until the end of that year. Heo and Douma said they were extra careful about testing and implementing it because Ranch had been on the Oracle HSM environment for more than a decade. All of its data would have to be moved to the new Quantum environment, and they couldn't afford any missteps in the migration process.
When TACC went live on the Quantum tape library, the onus of migration was put on the user community. Access to the old library was set to read-only until March 2020, but users have read-write access to the new Quantum library. It's up to the users to selectively move the files they want to keep into the new archive. Because of this user-driven approach, Douma and Heo couldn't say how close to completion the migration is.
One of Ranch's biggest challenges had been educating its user community on its use as an archive, as opposed to near-line storage. Many of the users don't understand the limits of the tape medium, and treat their Ranch allocations as something akin to Google Drive, Douma said.
"If they treat it like near-line, it's not good for all concerned," Douma said. "We want them to use it like an archive and not like a limitless amount of storage that's kept forever."
Implementing the Quantum tape library has served as a wakeup call of sorts to TACC's user community. Douma said the move demonstrates that TACC has changed its policies and forced users to pay attention to how Ranch works now, as opposed to blindly uploading files.
The sheer volume of data Ranch is responsible for archiving makes storing it on a public cloud unfeasible. Douma said the costs of recalling the data from the cloud would be astronomical, and given that the data is used for scientific research, there's a strong chance that all of it will need to be recalled sooner or later.
Therefore, tape is the only logical medium for archiving this much data -- it's air-gapped, doesn't require electricity to store and generates no heat. But like Solaris, Douma said it was getting harder and harder to find any systems administrators with tape experience. However, he added, he believes it's a marketing problem, and it's simply about making the medium appealing to the next generation of freshly minted IT workers.
"It's not very sexy to tell a 21-year-old, 'Hey, do you wanna work in our tape environment?'" Douma said. "But whenever I give tours of the tape library, everybody wants to watch the robot go back and forth."