JFK library digital archive project goes live, requires data protection

Article

JFK library digital archive project goes live, requires data protection

Dave Raffo, Senior News Director

The John F. Kennedy presidential library digital archive that went live online today is the result of a four-year, $10 million project to digitize hundreds of thousands of documents, audio tapes, film, photos and other artifacts collected in the 50 years since JFK's inauguration. And with millions of documents yet to be digitized, the archiving process is likely to continue for decades.

The Digital Archive Project team from the JFK Foundation uses storage from EMC Corp. and an Iron Mountain disaster recovery (DR) center to archive and protect the data. AT&T provides the web hosting, and Raytheon designed and implemented the system.

The foundation's digital archivist Erica Boudreau said the goal was to make documents that had only been available to visitors of the JFK Library in Boston open to anybody with an Internet connection.

. "Without the digital archive, you have to be onsite and go into a research room to see the video content and documents on display," she said.

    Requires Free Membership to View

    When you register for SearchDataBackup.com, you’ll also receive targeted emails from my team of award-winning editorial writers. Because your job never seems to get any easier, it’s our goal to keep you up-to-date on the latest backup tips, trends and technologies that will help you get the job done.

    Rich Castagna, Editorial Director

    By submitting your registration information to SearchDataBackup.com you agree to receive email communications from TechTarget and TechTarget partners. We encourage you to read our Privacy Policy which contains important disclosures about how we collect and use your registration and other information. If you reside outside of the United States, by submitting this registration information you consent to having your personal data transferred to and processed in the United States. Your use of SearchDataBackup.com is governed by our Terms of Use. You may contact us at webmaster@TechTarget.com.

However, archiving and protecting the digital data is a massive undertaking.

The JFK Library is divided into 13 collections. Because of the large amounts of data that needed to be digitized, the library's archivist picked six of the collections to go online for today's launch, including interactive exhibits dealing with the space program, the Cuban Missile Crisis and Civil Rights.

Since 2006, the library has digitized 200,000 documents; 300 reels of audio tape containing more than 1,245 telephone calls, speeches and meetings; 300 museum artifacts; 72 reels of film; and 1,500 photos. All the material is stored at high resolution to preserve fidelity.

Boudreau said the biggest data storage challenge was getting files such as oversized maps and old videos into the system. "These files are very large, and we're digitizing at the highest possible resolution," she said. "We have to capture our meta data, keep it with the file and deliver that content to a website. Oversized documents like maps were difficult to work with because of scaling. One video replication job is working on a 4 TB file share, and could take weeks to replicate."

The JFK Foundation's IT team uses EMC Documentum and Captiva software to ingest and organize the data, and mirrors the data between EMC Celerra NS-120 network-attached storage (NAS) and Centera archiving systems at the library's primary site in Boston and Iron Mountain's DR site in Boyers, Pa.

The size of the archive is expected to increase to about 117 TB by 2016, although the project team hopes to reduce capacity by applying Captiva's LZW lossless compression to document files.

Mirrors replace tape for backup

The library's IT specialist Tim Fitzpatrick said he started mirroring between sites to protect the files because tape backups were taking more than a week to complete. "Tape backups were becoming unmanageable," he said. "We established two sets of mirrors between the production environment and the DR location, and we replicate between the two. That protects us against a disaster where data is lost on a production system."

Boudreau said the digital archive project is funded mostly through private donations, and the technology partners have donated equipment and services. But Fitzpatrick said there are still some 48 million pages left to digitize before all of the JFK collections are online, and the library is still collecting material.

"At our current rate," Fitzpatrick said, "it will take over 100 years to get everything digitized."

 


Join the conversationComment

Share
Comments

    Results

    Contribute to the conversation

    All fields are required. Comments will appear at the bottom of the article.