Sustainability of Digital Formats: Planning for Library of Congress Collections

Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact
Format Description Categories >> Browse Alphabetical List

Microsoft Outlook PST 2003 (Unicode)

>> Back
Table of Contents
Format Description Properties Explanation of format description terms

Identification and description Explanation of format description terms

Full name Microsoft Outlook 2003 Personal Folders File (Unicode)
Description

The Personal Folders File or PST is an open proprietary data file format used to store local copies of messages, calendar events, and other items within Microsoft software including Microsoft Office Outlook. PST files are used to store archived items and to maintain off-line availability of the items.

See PST_ANSI for a description of general PST structure and characteristics.

The two versions of PST, PST_ANSI and PST_Unicode, are differentiated primarily by software implementation versions, character sets, maximum file size constraints and bit values.

PST_Unicode is the default format used by Office Outlook versions starting with Outlook 2003 and includes Outlook 2007, Outlook 2010 and Outlook 2013. It employs the Unicode character set.

The file size constraints for PST_Unicode are significantly larger than the PST_ANSI overall size limit of 2 gigabytes (GB). PST_Unicode can support file sizes up to 20 GB in Outlook 2003 and Outlook 2007 and file sizes up to 50 GB for Outlook 2010 and Outlook 2013. According to Microsoft, these file size limits can be extended but would negatively impact performance.

PST_Unicode uses 64-bit values to represent block IDs (BIDs) and byte index (IB).

Production phase PST files provide a mechanism for the centralized storage of email folders, email messages, their attachments, contacts, calendar items, etc.
Relationship to other formats
    Has earlier version PST_ANSI, Microsoft Outlook PST 97-2002 (ANSI)
    Affinity to TNEF, Transport Neutral Encapsulation Format

Local use Explanation of format description terms

LC experience or existing holdings The Library of Congress includes PST Unicode and PST ANSI files in its collections, especially in the Manuscripts and Music Divisions as well as other personal papers repositories.
LC preference The Library of Congress Recommended Formats Statement (RFS) lists PST as an acceptable format for Email: For aggregated groups of messages. The RFS does not specify a version of PST.

Sustainability factors Explanation of format description terms

Disclosure Fully documented. Proprietary file format developed by Microsoft.
    Documentation Microsoft [MS-PST]: Outlook Personal Folders (.pst) File Format specification available from Microsoft. See Format Specifications below.
Adoption

The Outlook .pst files are used for POP3, IMAP, and HTTP accounts and are supported by several Microsoft client applications, including Microsoft Exchange Client, Windows Messaging, and Microsoft Office Outlook.

Outlook 2003, Outlook 2007, Outlook 2010 and Outlook 2013 can read, write, and create both ANSI and Unicode PST files. By 2010 (when the specification was made public by Microsoft), PST_ANSI was considered a legacy format with a recommendation that it not be used to create new PST files. The default format was declared to be PST_Unicode.

PST_Unicode files are not compatible with Microsoft Outlook 97-2002 which read PST_ANSI files only.

At least two open-source software libraries have been developed to examine and manipulate PST files: libpff, a library (in C, with python bindings partially implemented as of late 2013) to access PST and related formats; PST File Format SDK, a cross-platform C++ library for reading PST files, developed under Microsoft auspices through a 2009-2010 project.

According to Microsoft, Outlook .PST files are supported in OneDrive but "they are synced less frequently compared to other file types to reduce network traffic." If users "enable PC folder backup (Known Folder Move) manually without the group policy, they will see an error if they have a .PST file in one of their known folders (e.g. Documents). If Known Folder Move is enabled and configured via group policy, .PST files will be migrated."

    Licensing and patents See PST_ANSI
Transparency See PST_ANSI
Self-documentation

The PST format version is declared in the file header. According to the specification, the wVer field for a PST_Unicode file must have a value of 23. Folder objects, message objects, and attachment objects all have properties which include the header fields users typically see in an email application as well as many properties relating to the status, management, and history of the object in an Outlook application. A message object also has a recipients table that identifies each recipient and may have an attachments table that lists and identifies attachments.

External dependencies None
Technical protection considerations See PST_ANSI

Quality and functionality factors Explanation of format description terms

Text
Normal rendering PST_Unicode can only represent UTF-16 strings (Unicode character encoding).
Integrity of document structure

At the physical level, the file starts with a header, followed by an optional density list, and then a series of mapping structures interspersed at set intervals between blocks of data. The mapping structures are of fixed size, and repeat as often as needed to encapsulate areas of data as the file grows.

At the logical level, a .pst file has three layers: the Node Database (NDB) layer, the Lists, Tables, and Properties (LTP) layer, and the Messaging layer.

An important structural improvement of PST_Unicode over PST_ANSI is that PST_Unicode files contain additional FPMap pages in addition to the initial FPMap in the HEADER, thereby extending their size limit beyond the 2 GB size limit demonstrated in PST_ANSI files.

The semantic structure of messages (with their headers) in folders and attachments linked to messages is represented in the Messaging layer.

Since this format is designed for active use in an email system as a stand-alone message store, the full semantics required and/or observed in the system that generated the file is represented.


File type signifiers and format identifiers Explanation of format description terms

Tag Value Note
Filename extension See related format.  See PST_ANSI
Internet Media Type See related format.  See PST_ANSI
Magic numbers See related format.  See PST_ANSI
File signature Hex: 53 4D 17 00
Hex: 53 4D 15 00
Offset 8 bytes from start of file. In conjunction with the magic number at the beginning of the file, this identifies that the file is a PST file using the PST_Unicode version. The 0x17 value is much more frequently found. According to Metz in Personal Folder File (PFF) file format specification: Analysis of the PFF format, the 0x15 value is believed to indicate the same format as 0x17 value (i.e. PST_Unicode) and was found in an 64-bit PST file created by the software Visual Recovery for Exchange Server but it is not common.
File signature x-fmt/249
PRONOM entry for Microsoft Outlook Personal Folders (Unicode). Identification based on internal signifier.
Wikidata Title ID Q1480633
See https://www.wikidata.org/wiki/Q1480633. Wikidata does not distinguish between versions of PST.

Notes Explanation of format description terms

General See PST_ANSI
History  

Format specifications Explanation of format description terms


Useful references

URLs


Last Updated: 02/28/2024