Standards: Part 21 - The MPEG, AES & Other Containers
Here we discuss how raw essence data needs to be serialized so it can be stored in media container files. We also describe the various media container file formats and their evolution.
This article is part of our growing series on Standards.
There is an overview of all 26 articles in Part 1 - An Introduction To Standards.
Media storage container files are essential for managing and automating workflows. Streaming architectures such as ST 2110 are useful in specific situations but permanent storage is always necessary. Each container format is designed for a specific task:
- RAW essence - Individual essence files simplify the workflow process. The workflow must maintain the synchronization and run length but can apply effects and other modifications more easily. This facilitates automation.
- Production - Production files carry uncompressed data for the content editing and creation process. Many different data types are stored in these files. Audio and video are the most obvious, but additional metadata that describes visual effects, 3D models, textures and associated timed text must all be stored in a structured fashion.
- Distribution - File containers for distribution are optimized for streaming to client player applications and devices.
- Archiving - Archival formats benefit from additional metadata for the supporting materials stored with them. It may be expedient to collect scanned images and textual documents into a single container. A Zip archive might be appropriate. Proprietary and application specific file types are not recommended. Simple, uncompressed sampled audio and individual video frames are good. Plain text files with markup to add styling are more likely to be readable in 500 years’ time than a highly structured word processor file.
- Legacy - Because archive and library collections tend to gather assets over a very long time, vintage content will be stored using older formats. These should be carefully documented. Older formats should be up-converted to a higher quality storage container to avoid losing access. Once the assets have been ingested, this can be automated to run on a scheduled basis.
Serialization Into Files
Essence data needs to be serialized so it can be written to a file. The receiving application reverses this process (de-serializing) when it opens the file and reconstructs the original format.
Structured metadata is often converted into XML or JSON formats for transmission via a web server. Unpacking those formats is well supported in web browsers with JavaScript.
The earliest media file formats only contained raw data. The internal structure is very simple with a header preceding the raw essence data.
The header contains a few metadata items to describe the format and scope of the raw data.
The next evolution splits the stored media into chunks. An optional trailer might be added to carry ancillary metadata.
After editing, the entire content is written to a new file. This takes a long time, leading to an unsatisfactory user experience.
An index to list the active chunks allows them to be rearranged, omitted or appended. Access to the index is easy when it is placed at the end of the file. The file manager provides the file length for your application code. If the last item in the file is an offset (N) back to the start of the index, then a simple file pointer seek function can locate it.
Editing the file removes redundant chunks from the index leaving them embedded in the file and appends new ones to the end of the file. The index is updated when the file is saved.
This is effective and quick but after many edits there might be some redundant chunks left in the file. These waste space. On poorly implemented solutions, the old redundant indexes are left intact but ignored. Flatten the file to its optimal size by rewriting it without the garbage.
Run the flattening process offline in the background as a scheduled workflow job.
A properly designed file editor will add the new chunks and replace the old index with the new one. That will only leave the redundant blocks in the main body to be flattened.
If the index or the chunks themselves are enhanced to identify a distinct data type, then metadata can be stored in a chunk as well. This facilitates mixed media and multi-channel storage in a single file.
QuickTime File Format (QTFF)
Apple developed the QuickTime file format as an object-oriented storage container. The media is stored in many small chunks called atoms. The data type of those atoms is indicated with a four-character (FourCC) code. The format is extensible because new atom types can be added to the repertoire at any time. Older file readers simply ignore the atoms they don't recognize. New atom descriptions are introduced via a registry process.
The atom payload can carry Unicode characters. This allows URLs and internationally localized text to use the full range of the character set.
ISO Base Media File Format (ISOBMFF)
The Apple QuickTime File Format (QTFF) was coincidentally very close to what ISO/MPEG were looking for as a candidate technology when they were designing the ISO Base Media File Format (ISOBMFF). Apple QTFF was duly adopted by ISO/MPEG. This is published as ISO 14496-12 (MPEG-4) for video and ISO 15444-12 (JPEG 2000) for images.
Additional features that support MPEG-4 coded media are described in MPEG-4 Part 14. Part 15 addresses the storage of AVC video content packaged for transmission on a network.
More recently, MPEG has standardized specific formats for different target applications in ISO 23000.
The ISOBMFF supports all the features and benefits that QTFF provided for timing, structure and metadata. This extensible format carries any kind of time-based media types you might ever need.
QTFF describes the media objects as Atoms and ISO calls them Boxes. Each box has a size parameter, type code specifier and a payload. The entire contents of the file are managed in boxes with no other data allowed.
The specification allows external ancillary files of any kind to be called in to use by reference. The primary timing, and framing information must be carried in the main ISO base media file.
QTFF supported very sophisticated VR techniques for creating panoramas and object movies by rapidly switching frames in an entirely non-linear fashion. Apple called them Navigable Movies. This has been inherited by the ISOBMFF and points towards some future Metaverse and VR applications which MPEG is developing standards for.
Since ISOBMFF and QTFF are based on the same DNA, applications designed to manage either kind of file can share a lot of common code. This is good for application developers because less code = easier maintenance.
These file types are based on ISOBMFF:
Container | Content |
---|---|
MP4 | MP4 file format. |
3GP | 3GPP file format for 3G UMTS multimedia services. Used on 5G mobile phones. |
3G2 | 3GPP2 file format for 5G CDMA2000 multimedia services. It is more efficient than 3GPP. |
MJ2 | Motion JPEG 2000. |
DVB | The DVB specific features of the ISOBMFF file format are described in ETSI TR 102 833 and DVB document A158. |
DCF | Stores a group of digital camera raw images. |
M21 | Contains MPEG-21 data. |
F4V | Adobe Flash Video. |
HEIF | High Efficiency Image File Format. |
Registering Atom & Box Type Codes
The Box and Atom type codes are managed by the MP4 Registration Authority. This is administered by Apple on behalf of ISO and the wider QuickTime community. The complete list of type codes is available for public access at their web site:
https://mp4ra.org/
The registry provides additional information and links to related standards defining the content and structure of each payload type. QTFF supports a few Atom types that are not available in ISOBMFF files. They are described in the registry to avoid name space collisions.
The Atom type codes are collated under these categories:
Category | Notes |
---|---|
ISO family | 270 unique code points. |
User-data | 40 unique code points. |
QuickTime specific | 16 unique code points. This table is allegedly not yet complete. |
Deprecated Atoms | These are described in the Apple QuickTime File Format documentation. |
The Apple QuickTime File Format is described here:
https://developer.apple.com/documentation/quicktime-file-format
FourCC Codes
The FourCC concept was designed by Apple in 1984 when the original Macintosh was released. These identifiers were widely used in the file system to identify resource data types. Apple describes them as OSTypes but the FourCC nomenclature is more popular. In the registry and some standards documents, they are also described as Code Points.
In 1985, Electronic Arts developed the IFF media file format for use on the Amiga personal computer. It used the same FourCC concept to identify the chunks of media data within the IFF file. The IFF technical documents credit Apple with the original innovation. IFF files are the foundation for many other formats (AIFF, RIFF, AVI and WAV). The Apple QuickTime files emerged in 1991 using the same idea but must surely have been in development for some time before that.
FourCC values are 32-bit integers composed of 4 8-bit ASCII characters. The data is arranged in big-endian fashion to render the characters left to right.
Normally the codes are constructed with ASCII printable characters. Spaces are also allowed. In rare cases, non-printing control characters are used, which makes them hard to read without special formatting software. By convention, most of the codes are spelled with lower-case characters.
Debugging ISO Media Container Files
Before QuickTime was revised to use the AVFoundation library, developers had a variety of tools for disassembling movie files for inspection. Those tools are long extinct now that QuickTime 7 is deprecated.
Here is an example output from an ancient Apple QuickTime de-compiler tool which illustrates the internal atom structure. ISOBMFF files would look very similar:
This illustrates how atoms are nested in multiple levels. The structure is mirrored into the object cache in memory when the file is loaded by an application.
MP4 and ISOBMFF formatting problems can be traced with these more recent inspection tools:
Tool | Details |
---|---|
MediaInfo | Available from the MediaArea company. Lists the contents of the video file. |
ffprobe | This is part of the ffmpeg toolkit and displays some of the metrics relating to the file content. |
Boxdumper | An open-source tool that displays the Box structure. |
IsoViewer | Inspects the internal box structure of an ISO file. |
MP4Box.js | This is a JavaScript library to process MP4 files in a web browser. |
Mp4dump | Built on top of the Bento4 C++ class library. This displays the entire box structure of an MP4 file. |
MPEG Application Formats
The ISO 23000 MPEG Application Format standards collate media storage requirements under various categories. Each one is addressed by a different part within the ISO 23000 family. Collectively, these are described as MPEG-A.
Part | Application format |
---|---|
1 | Purpose for multimedia application formats. |
2 | MPEG music player. |
3 | MPEG photo player. |
4 | Musical slide show. |
5 | Media streaming. |
6 | Professional archiving. |
7 | Open access. |
8 | Portable video. |
9 | Digital Multimedia Broadcasting. |
10 | Surveillance. |
11 | Stereoscopic video. |
12 | Interactive music. |
13 | Augmented reality. |
15 | Multimedia preservation. |
16 | Publish/Subscribe. |
17 | Multiple sensorial media. |
18 | Media linking. |
19 | Common media (CMAF) for segmented media. |
AES 31 File Containers
The AES31 standard was designed to store portable audio content into files for many different applications to access. It is a fundamental part of the ST 2110 architecture.
- Part 1 - Describes the basic format. The internal structure is chunked in a similar way to the ISOBMFF/QTFF files.
- Part 2 - Introduces the Broadcast Wave Format. It is based on the 32-bit Microsoft WAV format and carries the minimum necessary metadata to support broadcast applications.
- Part 2 Amd - Describes how to extend the storage capacity to accommodate larger media objects with 64-bit addressing.
- Part 3 - Adds metadata for editing tools, time-code and stereo/surround panning with audio gain control.
- Part 4 - Provides syntax mapping for Edit Decision Markup Language (EDML) based on XML. This automates the processing of audio in a workflow context.
Matroška File Containers (MKV)
This is an open-source project for a container supporting an unlimited number of media tracks in a single file.
Although the Matroška format is not from a Russian heritage, it is so named because the internal nested structure resembles the small hollow wooden dolls from Russia. Opening each doll reveals another smaller one inside. This illustrates the way that objects are owned by other objects in a hierarchical tree structure inside the container.
Internally, the file is object structured like QTFF or ISOBMFF. The objects in a Matroška file are called Elements and are documented at the Matroška web site:
https://www.matroska.org/
The element types are more rigidly defined than ISOBMFF. Matroška element codes are arguably less mnemonic and have a logical pattern when expressed in hexadecimal form. The ISOBMFF and QTFF type codes are more humanly readable. At the application coding and inspection level, a symbolic lookup table can map these values to more easily understood texts.
The Matroška element code registry also maintains additional properties that help diagnose malformed data in the container.
WebM Container Files
The WebM container format is based on a profile of Matroška. It supports AV1, VP8 and VP9 coded video with Opus and Vorbis audio. Theoretically it can support other video and audio codecs. The choices are merely constrained by the profile.
Because the container and codecs described in the profile are all open and license free, this is helping WebM to become more popular for web-based media delivery. The container format, all three video codecs and both audio codecs are supported by all the major web browsers.
A Menagerie Of Container Files
There are many container formats to choose from. Some are open-source and free to use while others incur patent license fees. The patents column indicates these dispositions:
- None - No relevant patents or royalties are due or the license fees are waived.
- Yes - There are known to be patents encumbering this standard.
- Proprietary - Owned by a corporate entity. There may be patents involved but they are sometimes waived.
- Expired - The standard is old enough that all applicable patents have expired.
Container | Owner | Patents | Description |
---|---|---|---|
QuickTime File Format | Apple | Proprietary | QTFF was originally developed as the container for QuickTime movies and related media. This is the direct ancestor of the ISOBMFF. Apple is gradually deprecating this format in favor of the ISO MP4 container. |
MPEG-1 Part 1 | MPEG | Expired | Carries MPEG-1 Audio layers I, II & III. Also used on DVD disks. Used for MP3 files in mobile audio devices. |
MPEG-2 Part 1 | MPEG | Expired | Almost the same as MPEG-1 but supports additional media data types. |
MPEG-4 Part 1 | MPEG | Yes | Version 1 of the MPEG-4 file format. |
MPEG4 Part 12 | MPEG | None | The ISO Base Media File Format (ISOBMFF) is derived from the Apple QTFF design. Any patents that might have applied will have expired. |
MPEG-4 Part 14 | MPEG | Yes | Version 2 of the MP4 file format. Derived from ISOBMFF. |
MPEG Program Stream | MPEG | None | MPEG-PS is a combined audio and video stream. |
MPEG Transport Stream | MPEG | None | MPEG-TS is a format designed for network streaming and broadcast transmission. |
BDAV MPEG-2 Transport Stream | BDA | Yes | M2TS is used for Blu-ray disks. |
3GPP | 3GPP | Yes | 3GP files are used for mobile applications. Derived from ISOBMFF and MPEG-4 but modified to make it more efficient for mobile networks. |
3GPP2 | 3GPP | Yes | 3G2 is an enhanced version of the 3GP container, used for 5G mobile applications. |
VOB | DVD Forum | Yes | Video Object containers are used for DVD disks. |
EVO | DVD Forum | Yes | Enhanced VOB containers are used for HD DVD disks. |
Matroška | CoreCodec | None | A free to use open-standard. Used by WebM. |
IFF | Electronic Arts & Commodore | Proprietary | The Interchange File Format was originally designed for the Amiga personal computer. This format is still relevant as the ancestor of many later and more popular formats. |
AIFF | Apple | Proprietary | The Audio Interchange File Format was developed by Apple. It was originally based on the IFF format. This can only carry uncompressed media. |
AIFF-C | Apple | Proprietary | Based on AIFF and intended to carry compressed media data. |
RIFF | Microsoft & IBM | None | The Resource Interchange File Format is the foundation for Microsoft WAV files. Similar to the Apple AIFF format, it is a tagged chunk format using FourCC codes like ISOBMFF and QTFF. The container is based on the IFF format. |
ASF | Microsoft | Proprietary | The Advanced Systems Format container is free to use but there may be royalties incurred when using the codecs. |
AVI | Microsoft | Proprietary | The Audio Video Interleave container is popular on Windows platforms. |
WAVE (WAV) | Microsoft | Proprietary | The Waveform Audio File Format was jointly developed by Microsoft and IBM to store audio bitstreams. Refer to RFC 2361 for details of the supported codecs. WAV files can contain compressed and uncompressed media on the Windows platform. |
MXF | SMPTE | None | The Material Exchange Format was designed for interoperability between broadcasters and production companies. Also used for Digital Cinema applications. |
F4V | Adobe | Yes | All Flash Video containers are now deprecated formats. |
Ogg | Xiph.Org | None | Open-source. Not widely used. WebM is a recommended alternative. |
WebM | None | Royalty-free and based on the Matroška container format. | |
RealMedia | RealNetworks | Proprietary | One of the earliest container formats. The preferred codecs are also all free of licensing restrictions. |
RealMedia Variable Bitrate | RealNetworks | Proprietary | RMVB is an enhancement to the original RealMedia container to support variable bitrates. |
DMF | DivX Inc | Proprietary | DivX Media Format is only used with the DivX encoders. |
FLAC | Xiph.org | None | A special type of container for Free Lossless Audio Coded media. |
There are other more obscure formats that are not included here. They tend to be legacy containers and only of interest when importing vintage content.
Conclusion
Storing media in files must be an entirely lossless process even if the payload might have been compressed with a lossy codec.
All things considered, the Matroška format is probably a good choice if you are building a workflow infrastructure. It has few limitations and because it is open-sourced there are no license fees. The code is easy to customize.
The MPEG containers probably have better support for end-users across all the client-player applications.
MPEG containers can carry a variety of different coded media types but are designed around the MPEG codecs. These have been around for some time and are being overtaken by the better performing newer codecs. Those new codecs are often stored in a Matroška file container.
MPEG containers are not likely to disappear. They may become even more popular when the patents expire and there are no more license fees incurred.
These Appendix articles contain additional information you may find useful:
Part of a series supported by
You might also like...
The Resolution Revolution
We can now capture video in much higher resolutions than we can transmit, distribute and display. But should we?
Microphones: Part 3 - Human Auditory System
To get the best out of a microphone it is important to understand how it differs from the human ear.
HDR Picture Fundamentals: Camera Technology
Understanding the terminology and technical theory of camera sensors & lenses is a key element of specifying systems to meet the consumer desire for High Dynamic Range.
IP Security For Broadcasters: Part 2 - The Problem To Be Solved
By assuming that IP must be made secure, we run the risk of missing a more fundamental question that is often overlooked: why is IP so insecure?
Standards: Part 22 - Inside AIFF Files
Compared with other popular standards in use, AIFF is ancient. The core functionality was stabilized over 30 years ago and remains unchanged.