October 29, 2009 -- Prominent information technologists from around the world shared their data-preservation experiences at the Sun Preservation and Archiving Special Interest Group conference (external link) in San Francisco during October 7 – 9, 2009. Several National Digital Information Infrastructure and Preservation Program partners gave presentations, including representatives from the University of North Carolina, Stanford University, University of Michigan, the Internet Archive, the California Digital Library and the San Diego Supercomputer Center.
The SunPASIG conference targeted digital preservation issues such as cloud storage, federated repositories and archives, large data sets and cradle-to-grave services. Presentations sparked productive discussions and by the end of the conference it was clear that most of the participants shared similar overlapping practices and themes. There was a general agreement about the need for interoperability and modularity within and between institutions, better risk planning, better access and robust audit systems for large data sets.
The topic of cloud storage crept into most conversations. Some participants expressed concern about how quickly and efficiently they could move data into and out of the cloud, particularly since many institutional collections are now very large. Clifford Lynch, Executive Director of the Coalition for Networked Information (external link) and an NDIIPP advisor, said it is inevitable that there will be transfer and access bottlenecks but no one is talking about it yet. "The implied assumption is that there's enough bandwidth," Lynch said. He called for a conversation between bandwidth providers and storage providers regarding timely transfer-rate performance.
Large-scale data management continues to be a challenge. Questions were raised about data loss and whether bits will inevitably get lost during constant migration. It was agreed that digital libraries don't have good models for bit disappearance and data loss and it is necessary to reach out to the scientific community, which has done extensive research in this area. Failure statistics are difficult to come by and not many libraries are willing to report on incidents of lost data. But they could benefit from hearing about what others have learned about data loss and what they have done to address the problem.
The SunPASIG conference leaders suggested developing applications and storage-resource broker services for cloud technology and they asked members of the conference for help in drafting specifications. They also said there was a need for a discovery navigation tool that offered multiple search parameters in all major languages.
Lynch also noted that the digital preservation community has matured and developed experience to the point where it is entering a collaborative phase. He encouraged collaboration on tool design and to make good use of reliable tools that we already have. "The time is right to reuse tools in disparate projects and for joint design of tools," he said.