November 17 , 2007 -- The Library of Congress has released a report on a collaboration with the San Diego Supercomputer Center (SDSC) to test the trustworthiness of information service providers by conducting data-transfer and storage tests with samples of the Library's extensive digital holdings. The work took place between May 2006 and October 2007.
At the heart of the project was the issue of trust, specifically how could the Library trust SDSC to reliably store several terabytes of the Library’s data. By what means could SDSC prove that the data was intact, preserved and well-cared for? What tests could the Library devise, and what metrics could SDSC produce, to guarantee the integrity of the Library's remotely stored data?
The Library had two main objectives for the project: For SDSC to host Library content reliably and return it intact at the end of the project; and for the Library to be able to remotely access, process, analyze and manage that content.
The content consisted of two different types of digital data from two divisions within the Library. The Office of Strategic Initiatives supplied approximately 6 terabytes of harvested Web sites. The tens of thousands of individual files that comprise a Web site were bundled and compressed into 500 megabyte ARC formatted files. Meanwhile, the Prints and Photographs division supplied approximately 580 gigabytes of digital image files, both high-density master TIFF files and their less-dense derivatives. These files were part of the Library’s Prokudin-Gorskii exhibit.
Inspired by SDSC’s staggering technological potential, the Library had devised several scenarios for the data tests. But ultimately, as the project progressed, the Library opted to keep its goals simple: data transfer, storage and file manipulation. In the end, both partners were happy with the project’s success. The project also produced lessons and unexpected results, some of which will have deep implications for all cultural institutions regarding transfer and storage of their digital assets.
The complete report is available: SDSC Library of Congress Data Storage Report (PDF, 1.4MB)