June 23, 2010 -- The Data Conservancy (external link) project seeks to develop the means to curate huge amounts of scientific data. A key part of the vision is to make data available to researchers in their quest to meet "grand research challenges that face society." Sayeed Choudhury, project principal investigator and associate dean for Library Digital Programs at Johns Hopkins University, talked about the effort at the Library of Congress on June 7, 2010.
Choudhury outlined the multi-faceted approach of the Data Conservancy, which is in its first year of funding under the National Science Foundation's Office of Cyberinfrastructure DataNet program. He noted that efforts are split into four distinct teams, dedicated to infrastructure research and development, information science research, broader impacts (such as educational requirements) and sustainability (which includes business requirements).
"There may be cases where something is scientifically compelling, but doesn't make sense from these other perspectives," explained Choudhury, as he described the group's strategic approach.
Choudhury gave a few examples from project work with astronomical data, but emphasized that the group planned to work across several disciplines. Speaking on the diversity of partners in the group, Choudhury said that "we felt it was really critical to engage with the scientists directly," in organizations such as the National Center for Atmospheric Research and the National Snow and Ice Data Center. "Different scientific domains may have different ways of doing research," he continued, which he described as a strength of the program that he hoped would lead to greater efficacy in its work.
Choudhury also credited the idea of emergence, as outlined in Steven Johnson's book of the same name, as being an inspiration for the type of interconnected data networks the Data Conservancy hopes to promote across disciplines. The idea of emergence is that a framework consisting of simple rules, containing several feedback loops, and which is adaptable to a number of situations, can be scaled to accommodate a highly sophisticated operation.
One of the main project goals is outreach to research communities. Choudhury explained that the idea is to encourage the "serendipitous discovery" of information among researchers by making their data sets available across seemingly unrelated files; he gave an example of the wide range of disciplines, from ornithologists to climatologists, who have an interest in geospatial data.
Previously in partnership with the National Digital Information Infrastructure and Preservation Program, Choudhury was a project manager for the Archive Ingest and Handling Test, which tested the feasibility of transferring an archive from one institution to another.