Library of Congress

Digital Preservation

The Library of Congress > Digital Preservation > About > Inside the Library > Library of Congress Digital Preservation Resources

Back to About Digital Preservation

The mission of the Library of Congress is "to make its resources available and useful to the Congress and the American people and to sustain and preserve a universal collection of knowledge and creativity for future generations." The National Digital Information and Preservation Program, a significant Library program dedicated to preserving the nation's cultural heritage found in digital form, is working to sustain that mission in regards to digital information.

Examples of the many tools, publications and best practices documents that have been incubated and developed by the Library under the auspices of the NDIIPP program can be found in these pages.

Digital Formats Sustainability

To help its staff plan for the future, the Library of Congress created the Sustainability of Digital Formats Web site. This ever-expanding resource provides internal guidance on strategic planning issues regarding digital formats and assists the Library in managing and preserving some of its most valuable digital materials.

The Formats Web site lists information on about 200 current and emerging file formats and their variants, including detailed documentation that will help the Library manage content created or received in these formats. The site identifies and describes formats that are promising for long-term sustainability and helps develop strategies for sustaining these formats.

Library of Congress Digital Preservation Tools and Services Inventory

This is a list of software tools and utilities designed, developed or used by the Library of Congress in its digital preservation program. By making this list available, the Library encourages others in the preservation community to share in, and take advantage of, the work and resources of the Library.

Tool Listing

Tools are listed alphabetically by the name of the tool. A suite of tools developed by the Library and its NDIIPP partners for the purpose of validation and transfer of data that conforms to the BagIt specification are now hosted at Sourceforge (external link).

BagIt

A format for transferring digital content. Content is packaged (the bag) along with a small amount of machine-readable text (the tag) to help automate the content's receipt, storage and retrieval. There is no software to install. A bag consists of a base directory containing the tag and a subdirectory that holds the content files. The tag is a simple text-file manifest, like a packing slip, that consists of two elements:

1. An inventory of the content files in the bag
2. A checksum for each file.

A slightly more sophisticated bag lists URLs instead of simple directory paths. A script then consults the tag, detects the URLs and retrieves the files over the Internet, ten or more at a time. This type of simultaneous multiple transfer reduces the overall data-transfer time. In another optional file, users can add content metadata.

  • Developer: Library of Congress, California Digital Library
  • Written in: n/a
  • OS and run-time environment: n/a
  • Application: n/a
  • Documentation: Bagit Specification (PDF, 83KB)
  • License: n/a
  • Last tool update: 05/31/08

Bag Validator

The Bag Validator tool is a small Python script that validates a Bag, checking for files in the manifest that are missing from the disk, files on the disk that are not listed in the manifest, and duplicate entries in manifest.

  • Developer: Library of Congress
  • Written in: Python
  • OS and run-time environment: Unix
  • Application: n/a
  • Documentation: Contact Leslie Johnston at lesliej [at] loc.gov for information
  • License: n/a
  • Last tool update: 06/20/08

Parallel Retriever

The Parallel Retriever implements a simple Python-based wrapper around wget and rsync, producing a package in the BagIt spec when given a "file manifest" and a "fetch.txt" file. It has been used to transfer content from several transfer partners hosting rsync and HTTP servers, at rates exceeding 200Mbps over Internet2. It was initially built specifically for Internet Archive rsync transfers, but was extended to support the BagIt spec, and HTTP as well as rsync.

  • Developer: Library of Congress
  • Written in: Python
  • OS and run-time environment: Unix
  • Application: n/a
  • Documentation: Contact Leslie Johnston at lesliej [at] loc.gov for information
  • License: n/a
  • Last tool update: 08/05/08

VerifyIt

The VerifyIt tool is a script that verifies a MD5 Bag manifest using 11 parallel md5sum processes.

  • Developer: Library of Congress
  • Written in: Shell script
  • OS and run-time environment: Unix
  • Application: n/a
  • Documentation: Contact Leslie Johnston at lesliej [at] loc.gov for information
  • License: n/a
  • Last tool update: 07/22/08

Back to top

Library of Congress Digital Preservation Reports & Publications

This is a list of publications generated by the Library of Congress in its digital preservation program. By making this list available, the Library encourages others in the preservation community to share in, and take advantage of, the work and resources of the Library.

Recent Publications

Ordered by publication date, with the most recent first:

Preserving News in the Digital Environment: Mapping the Newspaper Industry in Transition (2011)

A report produced for the National Digital Information Infrastructure and Preservation Program by a team from the Center for Research Libraries, it provides a glimpse inside the workplaces that produce what--in the analog age--we would have called newspapers
Preserving News in the Digital Environment: Mapping the Newspaper Industry in Transition

Preserving Our Digital Heritage: The National Digital Information Infrastructure and Preservation Program 2010 Report (2011)

A report that documents the achievements of the Library of Congress and its NDIIPP partners working together to create sustainable long-term access to digital materials since 2000.
Preserving Our Digital Heritage: The National Digital Information Infrastructure and Preservation Program 2010 Report

21st Century Shipping: Network Data Transfer to the Library of Congress (2009)

A paper written by Library of Congress Digital Media Project Coordinator Mike Ashenfelder published in the July/August 2009 issue of D-Lib.
21st Century Shipping: Network Data Transfer to the Library of Congress (external link)

Identifying and Implementing Modular Repository Services: Transfer and Inventory (2009)

A paper presented by Library of Congress Digital Media Project Coordinator Leslie Johnston at the DigCCurr2009 conference.
Identifying and Implementing Modular Repository Services: Transfer and Inventory (PDF, 66.5 Kb)

The Library of Congress and GeoMAPP: A Geospatial Multistate Archive and Preservation Partnership (2009)

Slides presented by NDIIPP staffer Butch Lazorchak and Cindy Clark from the Utah Automated Geographic Reference Center in March 2009 at the American Society for Photogrammetry and Remote Sensing Annual Meeting.
The Library of Congress and GeoMAPP: A Geospatial Multistate Archive and Preservation Partnership (PDF, 1.04 Mb)

NDIIPP Update (2009)

Slides presented by NDIIPP staffer Abbey Potter on January 24, 2009 at the Digital Preservation Interest Group at the American Library Association Midwinter Conference.
NDIIPP Update (PDF, 1.65 Mb)

Digitizing History (and Preserving History in the Making) (2008)

Slides presented by NDIIPP staffer Carl Fleischhauer on December 3, 2008 at the Government Video Technology Expo in Washington, D.C.
Digitizing History (and Preserving History in the Making) (PDF, 2.94 Mb)

Planning for the "Long Term"…..in Library Time (2008)

An paper by NDIIP Director of Program Management Martha Anderson and Library of Congress Information Technology Specialist Jane Mandelbaum presented at Digital Archive Preservation and Sustainability (DAPS) Workshop, held in conjunction with the 2008 MSST2008 25th IEEE Symposium on Massive Storage Systems and Technologies.
Planning for the "Long Term"…..in Library Time (PDF, 75 Kb)

Evolving a Network of Networks: The Experience of Partnerships in the National Digital Information Infrastructure and Preservation Program (2008)

An article by NDIIP Director of Program Management Martha Anderson from the The International Journal of Digital Curation, Issue 1, Volume 3, 2008.
Evolving a Network of Networks (external link) (PDF, 394 Kb)

International Study on the Impact of Copyright Law on Digital Preservation (2008)

This study focuses on the copyright and related laws of Australia, the Netherlands, the United Kingdom and the United States and the impact of those laws on digital preservation of copyrighted works. It also addresses proposals for legislative reform and efforts to develop non-legislative solutions to the challenges that copyright law presents for digital preservation.
International Study on the Impact of Copyright Law on Digital Preservation (PDF, 1.58 Mb)

Digital Curation at the Library of Congress: Lessons Learned from American Memory and the Archive Ingest and Handling Test (2007)

A conference paper that discusses the findings and conclusions of an analyses of American Memory at the Library of Congress, and the donated digital archive as they pertain to digital curation. The paper suggests opportunities for digital curation curriculum development.
Digital Curation at the Library of Congress: Lessons Learned from American Memory and the Archive Ingest and Handling Test

Data Center for Library of Congress Digital Holdings: A Pilot Project (2007)

Between May 2006 and October 2007, the Library of Congress and the San Diego Supercomputer Center conducted data-transfer and storage tests to test the trustworthiness of off-site storage of the Library’s digital materials.
Data Center for Library of Congress Digital Holdings: A Pilot Project (PDF, 1.4 MB)

Video Formatting and Preservation (2007)

A presentation by Carl Fleischhauer from NDIIPP at the November 6, 2007 DLF Forum.
Video Formatting and Preservation (PDF, 2.38 Mb)

Preservation of State Government Digital Information: Issues and Opportunities Report (2005)

A report presenting findings gathered from three states workshops held in 2005. This report is an importan part of the exploration regarding potential involvement of the states within the scope of NDIIPP.
Preservation of State Government Digital Information: Issues and Opportunities Report (PDF, 8.7 Mb)

The Archive Ingest and Handling Test (AIHT) Overall Final Report (2005)

A report documenting the development, administration and conclusions from the Archive Ingest and Handling Test (AIHT), a multiparty test of various digital preservation regimes. It describes the genesis of NDIIPP and of the AIHT; details the phases of the AIHT; documents lessons learned during the test; and suggests possible fruitful areas of future work.
The Archive Ingest and Handling Test (AIHT) Overall Final Report (PDF, 552 Kb)

Building Preservation Partnerships: The Library of Congress National Digital Information Infrastructure and Preservation Program (2005)

An article by William Lefurgy, Digital Initiatives Project Manager, Library of Congress Office of Strategic Initiatives. The article was written for an issue of Library Trends, titled "Digital Preservation," edited by Deborah Woodyard-Robinson, volume 54 number 1, Summer 2005.
Building Preservation Partnerships: The Library of Congress National Digital Information Infrastructure and Preservation Program (PDF, 130 Kb)

Version 0.2 of the Technical Architecture for the National Digital Information Infrastructure and Preservation Program (2003)

A document that outlines the state of thinking (circa 2003) on the Technical Architecture for the National Digital Information Infrastructure and Preservation Program (NDIIPP), following a period of review from April to July 2003.
Version 0.2 of the Technical Architecture for the National Digital Information Infrastructure and Preservation Program (PDF, 1.5 Mb)

NDIIPP Background and Planning Documents

Plan for the National Digital Information Infrastructure and Preservation Program (2002)

"Preserving Our Digital Heritage: Plan for the National Digital Information Infrastructure and Preservation Program" is in two parts. Part 1 provides an Executive Summary and details of the national plan for preserving digital materials. First printing October 2002.
Preserving Our Digital Heritage, Part 1 (PDF, 3.1 MB)

Part 2, the Appendices, offers important background and supplementary materials. These diverse Appendices illustrate the planning process and provide a rationale for the Plan's recommendations. First printing October 2002
Preserving Our Digital Heritage, Appendices (PDF, 16.9 MB)

It's About Time: Research Challenges in Digital Archiving and Long-term Preservation (2003)

The Library of Congress and the National Science Foundation hosted a workshop April 12-13, 2002 to identify specific research challenges associated with the long-term preservation of digital content. As detailed in the published report, the workshop identified a number of priority areas for research into new models, methodologies and tools for digital preservation. Sponsored by the National Science Foundation and the Library of Congress.
It's About Time: Research Challenges in Digital Archiving and Long-term Preservation (PDF, 15.9 MB)

Background Summary of Results from Interviews and Essays

In late August and September of 2001, the Council on Library and Information Resources (CLIR) undertook a series of interviews and commissioned a set of six environmental scans on behalf of the Library of Congress. This introductory paper summarizes the interviews and scans.
Summary of interviews and enviornmental scans (PDF, 25 Kb)

Convening Sessions — November 5-6, 7-8, 15-16, 2001
Summary Report

One hundred forty individuals representing a range of stakeholder communities, primarily content creators, distributors and users, were invited to participate in one of three 1-1/2 day sessions in Washington, D.C., in November 2001. The 70 who attended represented media and entertainment (film, television, music); scholarly, textbook, commercial and newspaper publishing; research libraries; heritage preservation organizations; universities; private foundations and independent authors and artists as well as representatives of other interested federal agencies.
Convening Session Summary Report (PDF, 31 Kb)

Library of Congress Office of Strategic Initiatives Reports

Office of Strategic Initiatives Strategic Plan 2008-2013

A plan intended as a living document to guide OSI as it develops programs, plans and strategies for the Library of Congress's digital future.
FY 2008 - 2013 Strategic Plan (PDF, 32.9 MB)

2005 OSI Annual Review (PDF, 6.4 MB)

2004 OSI Annual Review (PDF, 8.4 MB)

Back to top