Back to Digital Preservation Pioneers
Patricia (Trisha) Cruse, Director of the University of California Curation Center (external link), is passionate about promoting public information. Perhaps that passion developed because of the egalitarian culture she was steeped in during her student days at the University of California, Berkeley, or perhaps it was the populist support for open information that she witnessed during her travels in Eastern Europe.
Whatever the reason, she got the chance to combine her library training with public service through a job at the Louisiana State University library, where she worked as a government information librarian, serving up important information to the university community and to the public. Cruse helped distribute details from the Environmental Protection Agency Toxics Release Inventory (external link), which listed toxic releases from local chemical plants and businesses. She felt a special sense of mission in getting this information about the environment to underrepresented members of the local community. This was also her first professional experience working with public data and making it available to a range of communities.
While at the UC San Diego Libraries she used this experience to develop a web interface for an early version of federal publications in digital form. The Government Printing Office provided the data, but it was difficult for non-experts to use. A tremendous response to the project led her to think more broadly about how to reach out to the public.
Patricia Cruse's Library of Congress podcast
Cruse was excited when she went to work for the California Digital Library in 2000. She saw it as a step up to the next level of public service, with greater technological resources and a more expansive reach. "I take the philosophy that information technology has limitless potential for delivering services to the community," Cruse said. "I think about that every day at CDL."
One of her early CDL projects was Counting California (external link), which incorporated data from different federal and state agencies, such as housing and census material, crime statistics and educational attainment. CDL combined and marked up the data, using tools from the Data Documentation Initiative (external link), so that people could easily find and use the rich assortment of data provided by state and federal agencies.
While working with web-based data from the federal government she noticed that the information could disappear or change without notice. Cruse took steps to preserve data and make it persistently available. In doing so, she was one of the first practitioners of what is now known as web archiving. "We needed to provide libraries with the capacity to keep building their collections in a web-based world." Cruse said. "And we wanted it to be a usable tool and not something somebody would need a computer science degree to use."
That imperative is at the heart of the CDL Web-at-Risk Project, which is one of the eight original National Digital Information Infrastructure and Preservation Program investments. The project is developing tools to capture, store and provide long-term access to significant web-based information, including news and other details associated with thematic events. Web-at-Risk established CDL as a preeminent player in the new world of web archiving. Cruse has led the project from its start in 2004.
The project took on a deeper and more personal meaning for Cruse when Hurricane Katrina struck in 2005. "Having spent so much time in Louisiana, it was emotional," said Cruse. "Friends that I loved could potentially be destroyed by this hurricane." She also knew that details on the web describing the storm were both historically valuable and quickly changing. "It was right at this time that we had some lightweight components of our web archiving service in place, so we decided, by the seat of our pants, ‘Let's crawl this.'"
They used the Heritrix (external link) web archiving tool and archived thousands of web pages. The event became a lesson in web-archiving triage for CDL. Cruse said, "Crawling Katrina we learned so much. At first it was a really bad storm, and then a disaster, then it became a social event when race entered into it. We had to constantly refine our process to get the good stuff."
Building on experience from the project, CDL launched the Web Archiving Service (external link) in 2009 to enable users to collect, preserve and provide ongoing access to web-based information.
Cruse is now spearheading a new aspect of CDL: the University of California Curation Center or UC3. As the UC3 director, she promotes collaboration and partnership. "We understand that no one has the capacity to go it alone and nobody should go it alone in this economic environment," she said.
UC3 is working jointly with the UC campuses and a variety of national and international projects, such as the HathiTrust (external link) shared digital repository, DataCite (external link), and DataONE (external link), which brings together worldwide data for environmental research.
"It's an exciting time," Cruse said. "Instead of everybody reinventing the wheel we're putting our heads together and coming up with common solutions we can all use."