Library of Congress

Digital Preservation

The Library of Congress > Digital Preservation > Feature Series > Digital Preservation Pioneers > Eileen Fenton

Back to Digital Preservation Pioneers

Eileen FentonFollowing is an interview with Eileen Fenton, executive director of Portico (external link), an electronic archiving service. Portico is a partner in the National Digital Information Infrastructure and Preservation Program. NDIIPP is supporting Portico's development of the archives' technical infrastructure and an economically sustainable business model for a continuing archiving service for scholarly resources published in electronic form, beginning with electronic scholarly journals.

Insuring precious digital assets

As the old proverb says, “You don’t miss your water until your well runs dry.” That might apply to digital preservation too, as a warning to not take the availability of digital content for granted. For example, for decades libraries have collected printed scholarly journals on their shelves, but today's complex e-journals can no longer be housed on library shelves. They are instead stored within complex technical systems maintained by publishers, and for libraries this raises concerns about how journals will be preserved and made accessible decades into the future as publishers come and go. With support from the Library of Congress' NDIIPP initiative, Portico was created to address this concern. Portico offers a permanent archive of electronic scholarly literature, beginning with e-journals, that ensures scholarly e-publications are preserved and accessible over the long term.

Portico, which grew out of the academic journal archive JSTOR, archives digital publications and, under special circumstances or "trigger events," makes them available from its Web site to those libraries that financially support the archive. Trigger events include occurrences such as when a publisher ceases operations and the content is not available from any other source. Both academic publishers and libraries cooperate with Portico in the preservation of important published scholarship, and Portico's community-based approach counts on active collaboration.

“When publishers and libraries become Portico participants, they are effectively banding together – much like an insurance cooperative – to protect against the loss of the scholarly record,” says Portico Executive Director Eileen Fenton. “They're securing protection against the risk that, at some point in the future – it’s hard to know when – our digital scholarly heritage will be lost. The archive assures that access to the e-scholarship of today will be sustained for generations to come.”

Sorting through a tangle of media

By supporting a central, cooperative archive, libraries can avoid a number of daunting challenges – and much replication of effort. Receiving and preserving e-journals directly from the publishers requires dealing with and understanding the diverse complex formats that competing publishers use, and Portico undertakes this work once on behalf of a very broad community of libraries. Eileen explains, “Because we work with a very wide range of publishers Portico spends a lot of time and effort to build an infrastructure, understand data formats, load data, and create an interface from which many libraries (and publishers) can benefit. The advantage of a cooperative effort like Portico is that doing this work once on behalf of many institutions means any single institutions pays only a small fraction of the overall cost,” she says.

Eileen adds that by supporting Portico, librarians avoid the unhappy prospect of justifying to their university administration that "after paying substantial sums to license e-journal content for many years even more resources will be needed to load the content locally in the event of loss of access from the publisher. Not only that, but no one wants then to have to say, ‘And, oh by the way, in the future when formats change and technology shifts, you know that’s all going to have to get refreshed.’” Portico relieves their supporters of the technical burden of storing, sorting, and making sense of the files.

Normalization

After ingesting and sorting the jumble of media from various publishers, Portico “normalizes” the content, transforming all the different elements into similar ones that conform to some sort of regularity. As part of the process, Portico uses the archiving and interchange DTD developed by the National Library of Medicine (a DTD, or Document Type Definition, is the formal specification and definition of the structural elements and markup to be used in encoding specific types of documents in XML).

“We get content coming in the door in all sorts of formats,” says Eileen. “Publishers have a variety of DTDs. In some cases a publisher has been publishing the e-version for close to a dozen years now. They may have three or four different flavors of DTD. So we will get those files in, we run a variety of checks over them, if they get us a checksum we check that, we validate formats (with JHOVE). We understand the packaging the publisher has used and reverse engineer it so we understand the naming conventions and can identify the component parts of a given article.”

Portico doesn’t change the content, they change only the tagging and structure of the content. Eileen half-jokingly points out that Portico's work is very much in the spirit of the Hippocratic Oath. “First do no harm,” she says. They work carefully to preserve the intellectual content, including text, images, tables, and limited functionality, though they opt to not get sidetracked with retaining the original look and feel. Finally they repackage the content and prepare an archived rendition version for the Portico Web site. Over time, Portico migrates the source files into current formats as formats and technology changes.

Portico faces the same crucial technical infrastructure issues that other NDIIPP partners face, among other things: data storage, replication, and transfer, and other less-discussed considerations such as energy consumption. “We’re trying to follow the developments in making chips for storage mechanisms that consume a lot less energy,” says Eileen. “Maybe they operate a bit more slowly but they can hold more data and are more energy efficient. It’s fascinating just to talk to the San Diego folks (the San Diego Supercomputer Center) about utility costs, and obviously energy consumption is a global concern.”

When asked if Portico is thinking about using server farms, Eileen responds, “Not now. But we are keenly mindful of the rate at which the Portico archive is growing. We are literally in the thick of…very carefully building a strategy that deals with the growth of the content and the need to manage it in very cost-effective ways.”

Portico’s growth

In the fourteen months since Eileen lectured at the Library of Congress, support for Portico has grown to 38 publishers (who’ve committed over 6100 journals to be archived) and 362 libraries.

Eileen is pleased at the growth but not complacent. “We regard that as a great start it is only a start,” she says. “But there are so many libraries around the world – 362 libraries is just a very small fraction. As librarians we are all trained to preserve potentially valuable information for the future. One of the great things about network technologies and digital data is that they make it possible for content to be hosted in a limited number of locations and still be accessible by all. With very small and affordable contributions, all libraries can do their part to address digital preservation. So there’s still quite a lot of room for even broader engagement of the library community.”

Portico does indeed have significant international support. “In the first year of seeking library support, about 25% of the participants came from eight countries outside of the US,” Eileen says. And the size of each institution participating in Portico cuts across the spectrum. If you scan a list of Portico participants, she points out, “You see well-known and large research institutions, but you will also find many small liberal arts colleges also participating.”

Eileen points out that, in a sense, digital preservation is a de facto responsibility for libraries. “Some librarians speak of (digital preservation) as their ‘moral obligation,’ she says. “It’s not only part of their institutional mission; it’s also the right thing to do.” She adds that libraries share a fairly robust tradition of community-based efforts. And even though efficient preservation of digital resources is a group endeavor, it begins with each library’s contribution. “There’s something to be said about the power of a single library making a decision to support an effort like Portico or some other preservation effort,” Eileen says.

Her message of shared responsibility is that together, no matter what the individual role, Portico participants form a sturdy foundation. “The Portico model is built on the notion that you can, in fact, form a network – of publishers and libraries – that have a shared interest in the longevity of this content and working cooperatively with the archive they create a sort of three-legged stool that supports the ongoing life of this content.”

Back to top