|Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact|
|Full name||ESRI Shapefile|
The ESRI Shapefile (known here as the ESRI Shapefile format), stores nontopological geometry and attribute information for the spatial features in a data set. A shapefile consists minimally of a main file, an index file, and a dBASE table.
In the main file, the geometry for a feature is stored as a shape comprising a set of vector coordinates. This main file is a direct access, variable-record-length file in which each record describes a shape with a list of its vertices. In the index file, each record contains the offset of the corresponding main file record from the beginning of the main file. Attributes are held in a dBASE format file. The dBASE table contains feature attributes with one record per feature. Attribute records in the dBASE file must be in the same order as records in the main file. Each attribute record has a one-to-one relationship with the associated shape record.
The shapefile format can support point, line, and area features. Area features are represented as closed loop, double-digitized polygons.
Instances of the Shapefile format have often been used as a data exchange format from ESRI formats to non-ESRI applications. The format is most useful for writing simple features and attributes quickly as there are limitations inherent in the Shapefile format related to both geometry and attributes. As outlined elsewhere in this description, these limitations may cause loss of data when using shapefiles to contain or exchange complex geometry or attributes. The Shapefile format may be used as an intermediary between data creation applications and more functionally capable GIS formats and applications, albeit with the limitations noted in the Dataset/Normal Dataset section.
The cluster of files is typically stored in the same file directory or project workspace, with all component files having the same filename (prefix) and identified by individual file extension (suffixes). Three components are mandatory: a main file that contains the feature geometry (.shp), an index file that stores the index of the feature geometry (.shx), and a dBASE table (.dbf) that stores the attribute information of features. A comprehensive list of component files follows:
See Notes for more information about filenames and contents.
|Production phase||The Shapefile format is open and popular for data transfer. An initial state format during map and shape digitization output, employed as a middle state format by many programs and publishers, and used for data transfer between GIS applications. Shapefiles can be created by exporting any data source to a shapefile, digitizing shapes directly, using programming software, or writing directly to the shapefile specifications by creating a program. See Adoption under Sustainability Factors and Notes.|
|Relationship to other formats|
|Must have component||Main file (.shp) and index file (.shx), not described separately on this website.|
|Must have component||Shape_DBF, dBASE Table for ESRI Shapefile (DBF)|
|May have component||Optional component files include files with the following extensions: .sbn; .sbx; .atx; .fbn; .fbx; .ain; .aih; .ixs; .mxs; prj; xml; cpg. None of the formats for these files are described separately on this website.|
|LC experience or existing holdings|
|Disclosure||Fully documented. Developed and regulated by the Environmental Systems Research Institute, Inc. (ESRI), as an open specification for data interoperability among ESRI and other software products.|
|Documentation||ESRI Shapefile Technical Description: An ESRI White Paper—July 1998|
During the 1990s, ESRI introduced the Shapefile format and it soon became a de facto standard. The format is still widely deployed today, although the limitations outlined in the Quality and functionality factors and Notes within this description have led many contemporary users to move to geodatabase formats.
There are many applications, software libraries, and programming languages that can view, use or manipulate data in the Shapefile format including nearly all GIS applications, as well as web-based applications such as Google Earth. Data streams, such as those from global positioning system (GPS) receivers, can also be stored in the Shapefile format or X,Y event tables.
A number of U.S. government agencies have distributed data in Shapefile format, including the U.S. Geological Survey (USGS), the U.S. Census Bureau, the National Oceanic and Atmospheric Administration, the Environmental Protection Agency, and the interagency National Atlas of the United States project led by USGS. See Notes for more detail on the Shapefile format data available from these agencies. In addition, the Open Source Geospatial Foundation Geospatial Data Abstraction Library (OSGEO/OGR) supports the Shapefile format, and the Safe Software FME Spatial Data Transformation Platform supports read and write of the Shapefile format on Windows, Linux 32 and 64 bit, and Solaris operating systems.
|Transparency||Computer programs can be created to read or write the Shapefile format using the technical specification in ESRI Shapefile Technical Description: An ESRI White Paper—July 1998.|
|Self-documentation||GIS metadata documenting important characteristics of the resource found in the Shapefile format such as bounding coordinates, datum, etc. may be included as a .xml file within the file group.|
|Technical protection considerations||TBD|
The ESRI Shapefile format is a special-purpose dataset for storing nontopological geometry and attribute information for the spatial features in a data set. Its component Shape_DBF file uses a constrained form of the dBASE File Format (DBF) to store feature attributes using a limited set of data types.
Some relevant considerations are outlined in the ESRI Help explanation "Geoprocessing Considerations for Shapefile Output," including the idea that the relative simplicity of the Shapefile format's structure means that data may be lost if the format is used to transfer complex geometry and attributes. The document also notes that the format's attributes cannot contain null values, and stores numeric values as characters rather than binary, thus leading to rounding errors for numbers containing decimal places, i.e., real numbers. The format also lacks good support for Unicode character strings, thus limiting the use of non-English languages, and does not allow field names longer than ten characters. The format cannot store both a date and a time in the same date field, and cannot support spatial domains or subtypes. In terms of geometry limitations, instances of the Shapefile format have a 2 GB size limitation for any of the component file, but any given instance may take up to three to five times as much space as file GIS databases. The Shapefile format does not contain an XY tolerance (the minimum distance between coordinates before they are considered equal), thus impacting the precision with which comparison between features can be calculated. Since circular arc curves are not supported in the format, existing circular arc curves will be transformed to simple line features with closely spaced vertices rather than as true arcs.
|Support for software interfaces (APIs, etc.)||There are many non-ESRI applications that can view, use and output instances of the Shapefile format, although the instances that are output can easily be corrupted, and may not be properly formatted. Information about how to create data in the Shapefile format can be found within the ESRI Shapefile Technical Description: An ESRI White Paper—July 1998. Links to a free C library for reading and/or writing the Shapefile format, and an Open Source (MIT License) Python library for reading/writing in the format can be found in the Useful References section.|
|Data documentation (quality, provenance, etc.)||The minimal requirements for an instance of the Shapefile format do not specify a place for the documentation of data quality or provenance.|
|GIS images and datasets|
The minimal structure for the Shapefile format (i.e., the required .shp, .shx, .dbf files in the cluster) facilitates georeferencing to the extent that "auxiliary" files are also clustered in the same directory structure, including a .prj file for projection information, and a .txt or .xml file for metadata. If the metadata record for a given Shapefile format instance includes coordinates, datum, and scale, the location for the features represented by the instance of the format can be accurately and precisely determined.
The Shapefile format handles single features that overlap or that are noncontiguous. The format can support point, line, and area features. Area features are represented as closed loop, double-digitized polygons.
Because the format does not have the processing overhead of a topological data structure, it typically requires less disk space and is easier to read and write. It has advantages over some more complex geospatial data formats such as faster drawing speed and editability. However, Shapefiles do not have a spatial domain, which defines the geographic extent that all coordinates must fall within. This spatial extent is useful when editing geometry since it prevents you from entering coordinates outside the extent. See Normal Dataset and Notes for further limitations.
|Support for GIS metadata||When .txt or .xml files are included within the Shapefile cluster's directory, they are usually intended as metadata for the data contained within the other files that comprise the shapefile. No assumptions are made about the completeness or accuracy of the metadata, nor is any particular content standard presumed.|
|Support for grids||Instances of the Shapefile format are ready for grid analysis by virtue of the component Shape_DBF file, essentially a relational database table. This table contains the characteristics describing geographic features that are available for viewing and/or for simple grid analysis. The extent to which mathematical and statistical calculations can be performed against the data in the table is dependent upon the data structure built into the dataset layers comprising the shapefile. Often, the tabular data found in an instance of a Shapefile format are joined or related to other tabular data to support more complex analysis.|
||Known as the main file. One of three mandatory files in a Shapefile format cluster, stored in the same project workspace, typically a file folder. The .shp file stores the feature geometry and shares a base filename (prefix)with the index and the Shape_DBF file. See Notes for more information on filenaming.|
|Magic numbers|| Hex: 00 00 27 0A 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
|From the FileExtension Source|
||The index file stores the index of the feature geometry. One of the three mandatory files included in a Shapefile format cluster. In the index file, each record contains the offset of the corresponding main file record from the beginning of the main file (with extension .shp). The index file (.shx) contains a 100-byte header followed by 8-byte, fixed-length records. The shx file must have the same base filename as all other files included in the Shapefile format cluster.|
||For dBASE Table file. One of the three mandatory files included in a Shapefile format cluster. The dBASE table contains feature attributes with one record per feature. The one-to-one relationship between geometry and attributes is based on record number. Attribute records in the dBASE file must be in the same order as records in the main file. The dBASE file must have the same base filename (prefix) as all other files included in the Shapefile format cluster.|
||One of two files that stores the spatial index of the features, i.e., Part 1 of the spatial index for read-write instances of the Shapefile format.
||One of two files that stores the spatial index of the features, i.e., Part 2 of the spatial index for read-write instances of the Shapefile format.
||Created by ArcView 3.x for each shapefile or dBASE attribute index created in ArcCatalog. Not used by later versions of ArcGIS.
||One of two files that stores the spatial index of the features for shapefiles that are read-only along with .fbx files.
||One of two files that stores the spatial index of the features for instances of the Shapefile format that are read-only along with .fbn files.
||One of two files that stores the attribute index of the active fields in a table or a theme's attribute table along with .aih files.
||One of two files that stores the attribute index of the active fields in a table or a theme's attribute table along with .ain files.
||Geocoding index for read-write shapefiles.
||Geocoding index for read-write shapefiles (ODB format).
||For Projections Definitions file.
||Stores information (metadata) about the shapefile. In ArcGIS, the metadata file is often called metadata.xml and must be stored in the same file directory or project workspace as the rest of the component files in the Shapefile format cluster in order to be used by ArcGIS applications.|
||Specifies the codepage for identifying the character set to be used.
All file names in a Shapefile format cluster adhere to the 8.3 naming convention. The main file, the index file, and the dBASE file have the same base filename (prefix), which must start with an alphanumeric character (a–Z, 0–9), followed by zero or up to seven characters (a–Z, 0–9, _, -). All letters in a file name are in lower case on operating systems with case sensitive file names.
The Shapefile format stores integer and double-precision numbers. The ESRI Shapefile Technical Description refers to the following types:
Positive infinity, negative infinity, and Not-a-Number (NaN) values are not allowed in the format. Nevertheless, the format supports the concept of "no data" values, but they are currently used only for measures. Any floating point number smaller than –1038 is considered by a Shapefile reader to represent a "no data" value.
The functionality associated with the Shapefile format is constrained by the rules associated with the building and display of points, polylines, and polygons. Limitations are also imposed by the use of the dBASE component file with its field types and character width restrictions, its restriction to support only for ANSI characters in field names and values. The number of fields within an attribute table are limited to 255, and there is little support for SQL functions other than that provided by use of WHERE clauses. Feature class subtyping, assignment of attribute domains, geometric networks, topologies and annotations are not supported by shapefiles, thus more or less limiting functionality to that of normal GIS functionality.
The Shapefile format can be useful as a middle state when exporting data for use in a non-ESRI software application, or for exporting data to use in ArcView 3 or ArcInfo Workstation. The Shapefile format can be used to write simple features and attributes quickly, such as for ArcGIS Server geoprocessing services. But as is outlined in the ESRI Help explanation "Geoprocessing Considerations for Shapefile Output,", the format does not handle the full life cycle of data creation, editing, versioning, and archiving, thus inhibiting its use in modern life-cycle, active database management.
Here is some detail on the adoption of the Shapefile format by U.S. government agencies:
ESRI introduced the Shapefile format as a part of ArcView GIS version 2 during the 1990s. The format was welcome because interest in simple geometric structures had grown during the 1990s as disk storage and hardware costs decreased and computational speed increased. At the same time, existing geographic information system (GIS) datasets were more readily available, and the work of GIS users was evolving from primarily data compilation activities to include data use, analysis, and data sharing. Shapefiles could be easily created from many GIS systems and, over time, shapefiles were widely adopted as a de facto standard.