Sustainability of Digital Formats: Planning for Library of Congress Collections

Introduction | Sustainability Factors | Content Categories | Format Descriptions | Contact
Format Description Categories >> Browse Alphabetical List

Format Descriptions: Explanation of Terms

>> Back
Table of Contents
Identification and description
Local use
Sustainability factors
Quality and functionality factors
File type signifiers
Notes
Format specifications
Useful references
About Format Description Properties
• ID: Identifier for the format and its description.
• Short name: Mnemonic abbreviation used to refer to the format.
• Content categories: Content embraced by the format. More discussion: categories.
• Format category: General type of format, e.g., file format, bitstream encoding.
• Other facets: Facets (beyond format category) as defined by the GDFR (Global Digital Format Registry).
• Last significant update: Date description last updated.
• Draft status: Full [fully realized draft], Partial [draft in progress], Partial (low priority for LC) [format not expected to impact LC, no full draft contemplated], Preliminary [Lists format documentation and some description, with little or no analysis as to sustainability or functionality.]

>> Back

About Identification and description

Full nameName of the format. Formal name, if the format has been established by a Standard Setting Organization (SSO), which include both Standards Developing Organizations (SDO) like ANSI and the various bodies it accredits in the US, or trade organizations, consortia, or other groups. For formats established by corporations, the common or colloquial full name is provided. Common names are also provided for standardized formats, when the formal name excludes colloquial elements.
DescriptionBrief characterization of the format.
  Production phase  Indication of how the format is generally used during the content life cycle, e.g., by creators or authors (initial state), by distributors, publishers, or archives (middle state), or as delivered to endusers (final state).
Relationship to other formats Indications of the relationships between this and other formats, using relationship types listed in the following rows. Intended to mesh with the analytic work of the GDFR (Global Digital Format Registry) project. A handful of format descriptions currently include bracketed relationships that are not intended to be parseable, e.g., [Metadata specification from], found in EXIF_2_2.
  Has subtype A has subtype B (A is an extension of B) if all instances of B are also instances of A, but not all instances of A are instances of B.
  Subtype ofB is a subtype (restriction) of A if all instances of B are also instances of A, but not all instances of A are instances of B.
  ContainsThis format must contain another "format" either as a part of the format being described or as an encoding used. The GDFR containment relationship indicates an encapsulation association.
  May containEncapsulation as above, but optionally. A sequence of May contain rows in an FDD often indicate alternatives.
  Used byInverse of containment (encapsulation). Used only when it adds value for human reader, for example because the format or encoding being described has an important primary use within another format.
  Must have componentUsed for containment that does not involve encapsulation, but where a format is based on a group of related files that are stored together, typically within the same directory. Indicates a mandatory component.
  May have componentUsed for containment that does not involve encapsulation, but where a format is based on a group of related files that are stored together, typically within the same directory. Indicates an optional component.
  Component ofInverse of containment in a format represented by a group of related files, typically within the same directory. Use only when it adds value for human reader, for example because the file format plays a role in only a small number of formats based on a container directory.
  Defined viaUsed, for example, to indicate that an XML-based format is defined via W3C XML Schema or Relax NG.
  Requires [provisional listing ] Used to indicate that an understanding of the otherwise unrelated target format is significant to the understanding of the format being described. [May not be used.]
  Modification ofFor relationships that are neither strictly extensions or restrictions of each other. For example, Broadcast Wave Format (BWF) is a modification of WAVE, combining extension (a required BEXT chunk) with restriction (LCPM audio encoding only).
  Has modified versionInverse of Modification of, used only when it adds value for reader, e.g., because the related format is important and might be expected to be described as a subtype.
  Extension ofUse in preference to Has earlier version if the format being described satisfies the GDFR definition for Extension in relationship to the earlier version.
  Has extensionUse in preference to Has later version if the later format satisfies the GDFR definition for Extension.
  Has earlier versionPreferred to Version of
  Has later versionPreferred to Version of
  Version ofUsed when more precise relationship (e.g. subtype) does not apply and version relationship is non-temporal (e.g., geographical) or when sequence is unknown.
  Equivalent to For semantic or syntactic equivalence, e.g., TIFF (little-endian) is syntactically equivalent to TIFF (big-endian). If distinction between semantic and syntactic equivalence is needed, an explanatory comment should be provided.
  Affinity to Significant technical resemblance not meeting formal requirements of more specific relationship types.
  AdditionalCandidate for new type of FDD relationship. Proposed label for new relationship type should be provided in an explanatory comment.
  OtherSignificant non-technical relationship, e.g., published in same standards document. Should be accompanied by an explanatory comment.

>> Back

About Local use

LC experience or existing holdingsReport of actual practice at the Library of Congress.
LC preference Provisional Library of Congress format preference. In some cases, the statement will indicate that the format at hand is preferred; in others, the statement will identify a different format as preferred; in still others, no preference will be known. The statements are provisional because the process of establishing preferences is ongoing.

>> Back

About Sustainability factors

DisclosureLevel of available technical information about the format, including documentation that requires purchase. Typical statements of level include open standard, fully documented, partially documented, and little documentation. The statement of level is followed by the identification of source or developer. More discussion: disclosure.
  DocumentationCitation of specifications or other documentation. For standardized formats, identification of relevant standard, generally by assigned number. A few individual documents are cited in this location; if the list is long (as it is for many multipart ISO/IEC standards), the complete list is provided in the Format specifications or Useful references sections at the bottom of the page. All documents cited in the table portion of the Format Description are cited again in Format specifications or Useful references below.
AdoptionAssessment of the degree to which this format is implemented and employed. More discussion: adoption.
  Licensing and patent claimsStatement regarding patents and/or licensing.
TransparencyStatement regarding the nature of the encoding and/or bitstream, suggestive of the ease with which rendering tools may be obtained or built. More discussion: transparency.
Self-documentationStatement regarding the degree to which the format supports the inclusion of metadata (descriptive, administrative, and structural). More discussion: self-documentation.
External dependenciesStatement regarding the need for external software or hardware. More discussion: external dependencies.
Technical protection considerationsSupport by this format for elements that protect intellectual (or other) property. More discussion: technical protection.

>> Back

About Quality and functionality factors

Quality and functionality factors assess a format in relation to functional support for features and aspects that can affect the quality of content. Although a given format may be categororized within a single content category on this Web site, its description may employ quality and functionality factors from multiple content categories. For example, a format in the GIS Image and Dataset category may be described in terms of support for multispectral bands (Still Image) and support for software interfaces (Dataset), and support for GIS metadata (GIS Image and Dataset ).

Still Image; more discussion: still image Q&F
Normal renderingBaseline for the behavior of content when presented to a user, e.g., images that permit zooming.
Clarity (support for high image resolution) The degree to which this format supports the representation of pictures that would be deemed high resolution when viewed by experts or repurposed for a very high quality application. More discussion: still image clarity.
Color maintenance The degree to which the color gamut represented in a given image can be managed and maintained through various outputs or migrations; for example, via support for color encoding in different colorspaces and the metadata needed by color management systems, such as the inclusion of a color map for indexed-color files or an ICC profile for the capture device. More discussion: still image color maintenance.
Support for graphic effects and typography For still image formats that support vector graphics. Refers to the support within the format for scalable shapes, labels, legends, and other vector-graphic features. Also refers to the degree to which the format supports the use of shadows, filters or other effects as applied to fill areas and text, offers levels of transparency, and manages the specification of fonts and patterns. Discussion in context: still image graphic effects and typography.
Support for multispectral bandsSupport for the inclusion and documentation of multiple spectral bands in an image, generally employed to support scientific analysis, in contrast to the widely adopted color models oriented toward human perception, e.g., RGB or CMYK. More discussion: support for multi-spectral bands.
Functionality beyond normal renderingSupport for features that serve users with special interests. For example, some users will prefer that vector-based images like those used for architectural drawings remain malleable (editable) so that they can be modified after being copied from a library collection, while other users may require rich-data content, e.g., a still image with "extra bits per pixel" for use as a source for high quality repurposing, even though the full extent of the rich-data master cannot be seen on normal viewing devices.
Sound; more discussion: sound Q&F
Normal renderingBaseline for the behavior of content when presented to a user, e.g., sounds that can be played, stopped, and restarted.
Fidelity (high audio resolution) For sound and moving image formats. The degree to which this format supports the representation of sound that would be deemed high resolution when heard by experts or repurposed for a very high quality application. More discussion: sound fidelity and moving image fidelity.
Multiple sound channels For sound and moving image formats. The degree to which this format supports the representation of multi-channel audio, which is presented to the enduser at least two ways: (1) in terms of aural space or sound field, e.g., as stereo or surround sound, and (2) two or more signal streams that provide alternate or supplemental content, e.g., narration in French and German, sound effects separate from music, or the like. More discussion: sound multiple channels and moving image multiple channels.
Support for downloadable or user-defined sounds, samples, and patches For sound and moving image formats. The degree to which this format permits references to, or the inclusion of, digital sound data and the articulation parameters needed to create one or more voices or instruments in a musical presentation. More discussion: downloadable samples and patches.
Functionality beyond normal renderingSupport for features that serve users with special interests. For example, some users may require that music notation formats, e.g., MIDI, permit the use of a variety of sounds or tone sets to mimic actual instruments or create new tones and timbres. Rich-data content, e.g., a waveform with "extra bits per sample," may be created to serve as a master, i.e., for use as a source for high quality repurposing, even though the full extent of the rich-data master cannot be reproduced on some playback devices.
Textual; more discussion: text Q&F
Normal renderingBaseline for the behavior of content when presented to a user. Normal rendering for textual items includes convenient linear reading on screen, the ability to print sections of the document to paper, to excerpt quotations as text strings, to search for words within an item, and to index for searching as part of a corpus of documents. Rendering of any text item must reflect the intent of the author in representing the individual characters, paragraph structure, lists, headings, and indicators of emphasis.
Integrity of document structure Support for representations of the logical structure of textual works when usability and future usefulness for scholarship depend upon the navigation options and/or automated analysis that explicit tagging of structural elements enables. More discussion: integrity of structure.
Integrity of layout and display Support for representations of the look and feel of a textual work when exact choice of features such as font and column layout of a text document is essential to its meaning. More discussion: integrity of layout.
Support for mathematics, formulae, etc. Support for the accurate rendering of non-textual elements when these are essential to the informational content of the document. More discussion: integrity of rendering equations, etc.
Functionality beyond normal rendering Support for features that serve users with special interests. The behavior of a digital textual work (the functionality experienced by a reader) is supported by the combination of structural text tagging and the capabilities of a particular online environment or dedicated player (or e-book reader). The potential functionality supported by the underlying markup (e.g., representing the underlying hierarchy of the table of contents) must be distinguished from the particular view of the hierarchy. Many aspects of functionality for text (such as bookmarking or searching for words) are properties of a particular viewer rather than of the underlying content. E-books using a particular proprietary "reader" may provide functionality beyond that of a book presented as a linked set of web pages, even if the web pages and the e-book "file" were derived from the same marked up source data.
Moving image; more discussion: moving image Q&F
Normal rendering Baseline for the behavior of content when presented to a user. For moving images is associated with end-user implementations, normal rendering consists of playback of a single image stream with accompanying sound in mono or stereo through one or two speakers (or equivalent headphones). Player software provides user control over some picture elements (brightness, hue, contrast), some sound elements (volume, tone, balance), and navigation (fast forward, go-to-segment, etc.). For formats implemented in specialized professional applications, the same type of normal rendering does not obtain. Some professional authoring or editing systems, e.g., those used in non-linear video editing, permit playback in a manner comparable to that described for end-user implementations. But in other contexts, e.g., when working with the DPX format's frame images, normal playback will only occur "downstream," i.e., from a newly made file derived from the DPX source.
Clarity (high image resolution) For bitmapped representations but not vector-based animations like Flash files. The degree to which "high image resolution" content may be reproduced within this format. The term is meant broadly, referring to the factors that will influence a careful (even expert) viewing experience. A real test of clarity occurs when the reproduction is repurposed, e.g., when selected footage is edited into a new video program. More discussion: still image clarity and moving image clarity.
Functionality beyond normal renderingSupport for features that serve users with special interests. For example, some video formats offer functional features like scalabilty and interactivity. More discussion: moving image beyond normal functionality.
Web archiving; more discussion: Web archiving Q&F
Normal renderingBaseline for the behavior of content when presented to a user. Normal rendering for archived Web sites is identical to that expected for active Web sites on the Internet: users may read and scroll through text, follow hyperlinks from one page to another, and copy and print. Assuming that the harvesting tool had succeeded in collecting the images, sounds, or other elements that are embedded in a page, these are also presented to or accessible to users.
Documentation of harvesting contextSupport for recordation of the context and circumstances of the capture process. Harvesting detail includes information about how the record was requested (typically a http request) and the response (typically a full http response including headers and content body). More discussion: Web archiving context.
Efficiency at scaleSupport for efficient processing in the format used to store the harvested Web pages; pages should not be required to be held in a logical order; the format should not have an inherent limit to the filesize; should allow segmentation of large harvested resources support straightforward merging of aggregations of harvested pages is desirable. efficient indexing by original URL and the date and time of harvesting in order to permit Since simulation of the original Web experience in terms of following links found in pages is a part of normal rendering, the format must permit . if elimination of duplicate content becomes more feasible, then archiving formats be capable of storing relevant metadata that can point to no-longer duplicate data in another location, e.g., the dataset from a preceding crawl. More discussion: Web archiving scale.
Support for stewardshipSupport for activities to enhance access for researchers and future preservation activities. Examples include the ability to record metadata about harvested resources based on analysis of the harvested content, the enhancement of access for researchers by assigning topical subject terms based on textual analysis, and the ability to extract a subset of a Web archive for a researcher for specialized analysis. More discussion: Web archiving stewardship.
Functionality beyond normal renderingNone identified at this time.
Datasets; more discussion: Dataset Q&F
Normal functionality Baseline for the behavior of content when presented to a user. Includes considerations of data typing and data structure. More discussion: Dataset normal functionality.
Support for software interfaces (APIs, etc.) For all datasets, including GIS. Support for standard or widely available APIs (Application Programming Interfaces), software toolkits, or software libraries, particularly for subsetting datasets, data manipulation/transformation, data aggregation, or discipline-specific functionality. More discussion: Dataset support for specialized software interfaces.
Data documentation (quality, provenance, etc.) For all datasets, including GIS. Support for documentation of data quality or provenance in textual or structured machine-processable form, e.g., in a well-known XML schema. More discussion: Dataset support for data documentation.
Functionality beyond normal Support for features that serve users with special requirements.
GIS images and datasets; more discussion: Geospatial Q&F
Normal functionality Basic functionality for a GIS format regardless of underlying content type (raster, vector, or attribute) is georeferencing, placing information in relation to the surface of the earth using place names or assigning coordinates. To align different geospatial resources, they must all use or be transformed to the same geographic reference system, including what is known as a datum. Normal functionality within a geographical information system (GIS) involves support for basic spatial analysis functions. More discussion: Geospatial normal functionality.
Support for GIS metadata Support within the format for GIS-specific metadata, particularly in a form that satisfies standards or community practices in a way that contributes to interoperability. In addition to the metadata needed to support normal functionality, assessment of a resource's fitness for a particular purpose requires information about the quality of the data and its provenance and lineage. Important characteristics to be recorded in metadata include: projection, scale, datum for coordinates, precision. For certain categories of GIS files, other characteristics may be significant, e.g. cloud cover percentage for satellite images. More discussion: Support for geospatial metadata.
Support for grid-based analysis Support within the format for the performance of grid-based analysis. In this type of analysis, the area of interest is divided into rectangular cells based on geo-location (using a known datum and projection). The cells contain data values from a variety of sources and are stored in a format designed to hold gridded data. The values are then available for various forms of spatial and statistical analysis. Grids contain information that can range from geographic coordinates to reflectance values from solar radiation hitting surface features. More discussion: Support for grid-based analysis.
Functionality beyond normal Support for features that serve users with special requirements.
3D Model Formats
3D Model Geometry The geometry of a model describes its shape. There are several approaches to defining the geometry of a 3D model, including point clouds, line sets, meshes (often triangular), constructive solid geometry (built up by combining simple shapes), and free-form surfaces (often using non-uniform rational bi-splines, commonly known as NURBS). The use of NURBS is common in computer graphics for generating and representing curves and surfaces. It offers great flexibility and precision for handling both surfaces defined by common mathematical formulae and modeled shapes.
3D Model Appearance Appearance incorporates colors, textures, material types, shading, etc. A common approach for modeling appearance is texture mapping, in which a 2D image is molded to the surface as defined by the geometry. Another approach is to assign appearance attributes (e.g., color, texture, material type) to each face of a surface defined by a mesh. Other aspects of surface appearance include reflectivity (aka specularity) and transparency.
3D Model Scene The scene of a model includes the position of light sources, cameras, and the relative positions of objects.
3D Model Animation Animation defines how a 3D model moves. For a discussion of different ways to represent the animation of the components of a "skeleton," see A Comparison of 3D File Formats by Marcus Lundgren.
Aggregate; more discussion: Aggregate Q&F
Compression One of the key features of aggregate files is support for compression and different file formats support a variety of types of compression algorithms, ratios, and methods (i.e., lossy and lossless compression). RAR, for example, uses proprietary compression algorithms and the compression ratio is stored in the Compression Record tag in the file header; 7z has many options for compression methods whereas ZIP only uses the DEFLATE algorithm. Tar files, on the other hand, are not natively compressed but can be compressed with external utilities.
Support for error detection Aggregate files often include parity checks, checksums and other fixity mechanisms for error detection. ZIP files, for example, use a CRC-32 for checking file integrity. RAR also used optional CRC-32 hash values until RAR5 when the method switched to 56 bit length BLAKE2sp hash. In addition, RAR archives have an optional recovery record structure in the archive header. According to WinRAR Recovery Help, the "presence of recovery record makes an archive [file] larger, but allows to repair it even in case of physical data damage due to disk failure or data loss of any other kind, provided that the damage is not too severe."
Functionality beyond normal Support for features that serve users with special requirements. ZIP files, for example, can be constructed as a self-extracting executable file which is often used for software packaging. Another ZIP example is that it can support "patching" technology to distribute revised document content by delivering only the changed elements of a prior document instead of having to deliver a complete new copy of the revised version.
Generic
No specific quality and functionality factors defined for generic formats For list of generic formats documented to date, see Format Descriptions for Generic Formats.

>> Back

About File type signifiers

Wikidata unique identifier with
This table documents some of the signifiers that may be used by automated systems to identify a format or the data it contains. A more comprehensive listing of signifiers for a number of formats may be found at the JHOVE web site (http://jhove.sourceforge.net/).
Tag typeValueNote
Filename Extension Name extensions generally used in Windows, UNIX, and other environments. From various sources.
Internet Media Type  When possible, from the IANA MIME Media Types site; otherwise (and in addition) from other sources, e.g., Filext.com.
Magic numbers The first bits in a file, used to identify the type of file, often associated with Unix and its derivatives. The most frequent sources for information are Gary Kessler's File Signatures Table and Filext.com.
Microsoft FOURCC Four-character identifier for video codecs (and other elements) used in older Microsoft formats, such as AVI and ASF. See the list of FOURCC codes. FOURCC codes are no longer managed by Microsoft, which has developed a new mechanism for identifying rich media types in general. Archived versions of the registry from Microsoft can be found at the Internet Archive: for example, one with information as of December 2001 and one as of June 2003.
Microsoft WAVE format registry Audio codec identifier in Microsoft formats, originally from the same Microsoft registry as the FOURCC codes (see archived links above). A 4-byte code often represented in Hex or as a number. Audio tags can be found in various places, none necessarily complete, e.g., Microsoft's successor list of audio subtype identifiers (GUIDs) and WAVE and AVI Codec Registries (Historic Registry), archived by IANA in 2008.
ASF GUID Globally Unique IDentifier for media types and other elements in Microsoft ASF files, from the Advanced Systems Format (ASF) Specification.
Apple Video Sample Description Four-character codes for video codec identification in QuickTime files from The QuickTime File Format.
Apple Sound Codec four-character codes Identifiers for audio codec identification in QuickTime files from The QuickTime File Format.
PRONOM PUID  PRONOM Unique Identifiers (PUID) from the PRONOM Technical Registry supported by the National Archives UK. For more information about matching fdds to PUIDS, see Mapping FDDs to PRONOM and Wikidata Unique Identifiers.
Wikidata Title ID Wikidata unique identifiers with "Q" prefix. For more information about matching fdds to PUIDS, see Mapping FDDs to PRONOM and Wikidata Unique Identifiers.

>> Back

About Notes

GeneralAdditional information, often an extension of description in the first section above.
HistoryInformation about the history of the format.

>> Back

About Format specifications

URLs
Pointers to relevant online resources
Print
Pointers to relevant print resources

Useful references

URLs
Pointers to relevant online resources
Print
Pointers to relevant print resources


Last Updated: 01/28/2022