Library of Congress

Digital Preservation

The Library of Congress > Digital Preservation > News Archive > Web Preservationists Meet

June 11, 2009 -- On May 4-7, 2009, 60 participants from 20 countries gathered at the Library and Archives Canada (external link) in Ottawa for the annual International Internet Preservation Consortium (external link) General Assembly.  The meeting focused on activities of the three IIPC working groups.

Image of IIPC General Assembly Members

IIPC General Assembly at the Library and Archives Canada. Credit: David Knox

The Access Working Group featured guest speaker Brian Davison, from Lehigh University, presenting on "Searching Archival Webs Efficiently & Effectively." Davison outlined early findings from a National Science Foundation project to implement improved search and navigation services for web archives. Breakout groups focused on the topics of "Scalability of Indexing Software," "Resource Discovery of Web Archives: Cataloguing, Full-text Indexing, What Else?" and "Wayback/NutchWAX Integration (issues, tips, and best practice)."

Members of the Harvesting Working Group learned about the latest updates to the open-source crawler Heritrix (external link) and then discussed new ideas for projects. Topics included: a best practice report for domain crawls, better crawling of rich media content and social media sites, a tool for automatically determining the borders of a national domain or finding thematic material, a shared crawl database and development of Firefox extensions to assist with web archive quality assurance tasks.

The Preservation Working Group had a practical discussion about WARC tools. Updates on existing tools were provided as were potential tool use cases. Additional updates also were given on current group work packages, including Discussion of Preservation Objectives, Preservation Skills Development, Preservation Strategies, Environmental Scan of Technical Environment, WARC Issues, Metadata and Workflows.

Attendees also discussed a proposed project to preserve Olympic 2012 websites. 

The Canadian Association of Research Libraries (external link) helped sponsor an open session with panelists from the Library of Congress, Internet Archive (external link) and Netarchive (external link) talking about preserving Internet content for future generations.  Several members presented status reports on web archiving activities, including the British Library, California Digital Library, National Library of the Czech Republic, National Library Board (Singapore), National Library of Norway, Bibliothèque nationale de France and Library and Archives Canada.

For more information about the IIPC, visit http://www.netpreserve.org (external link).