Sustainability of Digital Formats: Planning for Library of Congress Collections

Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact
Format Description Categories >> Browse Alphabetical List

Internet Message Format

>> Back
Table of Contents
Format Description Properties Explanation of format description terms

Identification and description Explanation of format description terms

Full name Internet Message Format
Description

Internet Message Format (IMF) is the standardized ASCII-based syntax required by SMTP for all email message bitstreams used by a message transfer agent, sometimes referred to as a mail transfer agent or MTA, when moving messages between computers. IMF is standardized by RFC 5322. IMF syntax itself does not cover other types of non-text data in email messages such as images, audio or other sorts of structured data which are described in other parts of the MIME document series (RFC 2045, RFC 2046, RFC 2049).

IMF requires that messages use only US-ASCII characters and that the characters are divided into lines. A line is a series of characters that is delimited by carriage-return (CR) immediately followed by line-feed (LF). Taken together, these are commonly abbreviated as CRLF.  Each line of characters is limited to no more than 998 characters, and is encouraged, for the sake of interoperability, to be no more than 78 characters.

An IMF-compliant email message consists of a header section comprised of defined fields followed, optionally, by a body. The header section is a sequence of lines of characters with special syntax as defined in this specification. The body is simply a sequence of characters that follows the header section and is separated from the header section by an empty line (i.e., a line with nothing preceding the CRLF).

Header fields are well defined lines beginning with a field name, followed by a colon (":"), followed by a field body, and terminated by CRLF. Header field bodies may have a structured or unstructured syntax.  Header fields may appear in any order, and they have been known to be reordered occasionally when transported over the Internet. Selected fields may repeat within the header.

Required header fields include:

  • origination date or orig-date, formatted as field name "Date" followed by a date-time specification (Date: date-time CRLF) which specifies the date and time at which the creator of the message indicated that the message was complete and ready to enter the mail delivery system. This typically represents when the message author presses the Send button.
  • the grouping of several fields that comprise the originator address set including from, formatted as From: mailbox-list CRLF. sender when applicable and formatted as Sender: mailbox CRLF, and optionally reply-to formatted as Reply-To: address-list CRLF.

All other header fields are optional and include: reply-to, to, cc, bcc, message-id, in-reply-to, references, subject, comments, keywords

Message bodies are simply lines of US-ASCII characters but with two essential requirements:

  • CR and LF MUST only occur together as CRLF; they MUST NOT appear independently in the body.
  • Lines of characters in the body MUST be limited to 998 characters, and SHOULD be limited to 78 characters, excluding the CRLF
Relationship to other formats
    Used by MBOX, MBOX Email Format
    Used by EML, Email (Electronic Mail Format)
    Used by PST_ANSI, Microsoft Outlook 97-2002 Personal Folders File (ANSI)
    Used by PST_Unicode, Microsoft Outlook 2003 Personal Folders File (Unicode)
    Affinity to CCA, cc:Mail Archive Email Format
    Affinity to CPIM, CPIM Instant Message Format. Similar header syntax

Local use Explanation of format description terms

LC experience or existing holdings Not directly applicable because IMF is a syntax rather than a separate format. However, the Library's collections so contain email formats defined by IMF. See EML, MBOX Family and MSG for examples.
LC preference See the Recommended Formats Statement for the Library of Congress format preferences for Email content.

Sustainability factors Explanation of format description terms

Disclosure Fully documented
    Documentation IMF is fully documented in RFC 5322 and its antecedents, RFC 2822 and RFC 822.
Adoption IMF is the standard syntax defined by IETF for the message bitstream when moving email message from one computer to another. As such, it is highly adopted and interoperable with many tool sets and applications.
    Licensing and patents None
Transparency

IMF files are US-ASCII text so are accessible through plain text processing tools.

Self-documentation Metadata is available through the well-structured header fields.
External dependencies None
Technical protection considerations None

Quality and functionality factors Explanation of format description terms


File type signifiers and format identifiers Explanation of format description terms

Tag Value Note
Filename extension Not applicable.  See related email formats
Internet Media Type message/rfc822
This is the common MIME type for all formats based on RFC 822.
Magic numbers Not applicable.  See related email formats.
Pronom PUID fmt/278
See http://www.nationalarchives.gov.uk/PRONOM/fmt/278.
Wikidata Title ID Q82721505
See https://www.wikidata.org/wiki/Q82721505.

Notes Explanation of format description terms

General IMF has been developed in step with Simple Mail Transfer Protocol. SMTP is the widely used protocol to send email messages from the authors mail program or email client to the mail server and between servers too. Where SMTP is equivalent to the message envelope, IMF is equivalent to the letter within the envelope. Receiving mail from a server is accomplished using POP or IMAP.
History

RFC822, published in 1982, established the framework for the header structure and was widely used. Revisions and refinements to this structure include RFC 1123 (1989), RFC 2822 (2001) and most recently RFC 5322 (2008). RFC5233 includes this summary of the changes between RFCs: “One important difference between the obsolete (interpreting) and the current (generating) syntax is that in structured header field bodies (i.e., between the colon and the CRLF of any structured header field), white space characters, including folding white space, and comments could be freely inserted between any syntactic tokens. This allowed many complex forms that have proven difficult for some implementations to parse. Another key difference between the obsolete and the current syntax is that the rule … regarding lines composed entirely of white space in comments and folding white space does not apply. The NUL character (ASCII value 0) was once allowed, but is no longer for compatibility reasons. Similarly, USASCII control characters other than CR, LF, SP, and HTAB (ASCII values 1 through 8, 11, 12, 14 through 31, and 127) were allowed to appear in header field bodies. CR and LF were allowed to appear in messages other than as CRLF.”


Format specifications Explanation of format description terms


Useful references

URLs


Last Updated: 03/02/2023