Back to Digital Preservation Pioneers
Geospatial data—information about location—is woven into our lives more each day. Cell phones, cars, cameras and other appliances use the global positioning system to track and fix our positions in time and space.
"Not only are people much more comfortable using the technology, they now have this expectation that more things are going to be geospatially enabled," said Julie Sweetkind-Singer, assistant director of Geospatial, Cartographic and Scientific Data & Services at Stanford University. "The explosion of geospatial technology is only beginning. We're going to see massive integration of it in everything that we do."
Sweetkind-Singer is at the forefront of digital geospatial information preservation. She served as co-principal investigator for the National Geospatial Digital Archive, one of the original National Digital Information Infrastructure and Preservation Program partnerships.
Maps are her passion. When she began working as a librarian in the late 1990s, she dealt mainly with paper-based geographic content and maps, but now most of that information is digital and she is equally passionate about the possible applications for geospatial data.
Though online maps have been around for awhile, in 2005 the technology behind Google Maps and Google Earth contributed to a major cultural shift. "It transformed how people access space," said Sweetkind-Singer. In a short period of time, neogeography (external link) became a fun consumer tool, enabling users to swoop and zoom all over the earth from within their web browsers and users began cobbling together "mashups," plotting real estate, restaurants and a wide range of place-based items to online maps.
Sweetkind-Singer is especially interested in geographic information systems, which are basically relational databases with layers of information. She described how the information might be used. "The bottom layer may be a street grid," she said. "Next there could be a building layer that would show you schools, hospitals, houses, banks. Then you can add in earthquake data and start querying the system. Does an earthquake fault run under the hospitals? Have the hospitals retrofitted to withstand an earthquake of a specific size? One may then run a query to view every building that has not been retrofitted. The query can be displayed on a map which highlights buildings that would be most likely to fail in a strong earthquake."
When Sweetkind-Singer heard about NDIIPP, she knew it was a good fit for Stanford's interest in geospatial data stewardship. She contacted the University of California at Santa Barbara's Map and Imagery Lab and suggested a partnership. UCSB was interested in imagery and raster-based data and Stanford was more interested in vector-based data and scanned maps, so UCSB and Stanford seemed like natural partners. Together they developed the NGDA with a goal to rescue and archive potential at-risk digital geospatial data.
At first they pondered how to save their content for 100 years, but as their work progressed they narrowed their time frame realistically and settled for handing the archives off, intact, every five to ten years. Cost and practicality were also a factor. Sweetkind-Singer said, "We asked ourselves how we will do these handoffs efficiently, safely and at a low-enough cost to make preservation and access worth it in the long run."
The challenges of archiving geospatial data are unique. A JPEG file, for example, is "flat." If you open it up, it contains only one unit of digital information to represent the JPEG image. That simplicity of structure makes it easier to analyze and validate.
GIS data is more complex. A shapefile (external link), for instance, is a digital vector storage format for storing geometric location such as points, lines and polygons." When you crack open a shapefile with something other than its rendering software you can see that it contains a series of folders, each containing different types of information.
Applying the earlier hospital/earthquake example to a shapefile, one folder might contain information on how old the hospital is, what materials it was made with, if it was ever retrofitted, how many doctors work there and so on. All of that information is held in a database inside one of the files. Another file could tell you about the geographic extent of that file, such as what the latitude and longitude corners and cutoff points are.
A shapefile could have seven or eight different files traveling along with it and a number of those files must be there in order for it to render correctly. The files may or may not work independently but they must all travel together.
Another type of geospatial data, Landsat (external link) imagery, comes in bands, each consisting of different types of information. Landsat bands work independently or you can combine them for different types of information. For example, you can view land-cover imagery bands or combine some bands into false-color images (external link).
NGDA created geospatial format information to help others preserve geospatial data. Sweetkind-Singer said, "When you look at a shapefile, you should expect to find certain types of information and a series of folders, and within each folder expect to find certain kinds of files." Data from a Landsat satellite can come down in a format such as GeoTIFF. This format information is being given to the Library of Congress to be added to their Sustainability of Digital Formats website, which currently has no geospatial information defined. It will also be donated, when appropriate, to the Unified Digital Format Registry (external link).
NGDA also explored ways to provide access to their archives. Sweetkind-Singer admits that it is complicated, given the different file formats and differences in repositories. Still, she prefers diversity among the repositories to help reduce the risk of failure. "You want to have repositories around the U.S. that are built in different ways and that are not all the same," Sweetkind-Singer said. "The more you implement exactly the same technology the more potential points of failure you have."
This means that they must develop tools that can search across different archives, platforms and data types. She understands that it will take time to develop those tools. She said, "We're far away from that now but I think it's where people are moving."
Sweetkind-Singer makes a case for not sealing data off in dark archives. She said, "If we want our material to remain relevant and used, we need to make it available. Few people are willing to pay for putting something in a black box and preserving it. What they are interested in is access. What I want to help make happen is for the Stanford student 100 years from now to access the digital materials that I'm collecting as easily as they can walk to the library stacks and pull a book off the shelf today."
She sees geospatial librarianship as a flourishing field. Given the wide acceptance of geospatial-enabled technology, she thinks that consumers and researchers will increase their spatial awareness and expectations, and geospatial librarians should be prepared to help. Sweetkind-Singer said, "Map and geospatial librarians should be able to help people with software and data, help people create maps and analyze spatial data and consider to what uses people might want to put these resources."
There are important long-term considerations as well. "Preservation and long-term access to information is really about threat mitigation," she said, "What you're trying to do is lower the amount of threat to the loss of information in as many ways as you possibly can."
Sweetkind-Singer sees an immediate need for expanding geospatial data stewardship. She said, "It isn't that we're going into the digital world. That's simply where we are."