Standards: Audio - ID3 Metadata Tagging
Metadata tagging is easily overlooked but essential for managing audio assets throughout a production workflow. Here we offer practical guidance on embedding and converting metadata across common audio file formats.
ID3 Metadata Support
Workflow processes often need to access metadata about the files they are working on. MP3 files support embedded tagging using the ID3 convention. This is an informal standard but is widely used and can embed metadata into several kinds of files other than MP3:
- AIFF
- WAV
- MP4
- OGG
- FLAC
- APE
- MPC
- RealAudio
ID3 is not part of the MPEG standards and is managed independently. The informal specification is maintained at the ID3 website:
https://id3.org/
Each ID3 tag is stored in one or more frames in the file. Encoded audio is also stored in frames which contain a synchronization pattern that decoders detect to access playable content. ID3 describes a way to ensure it never spuriously triggers that synchronization by avoiding that bit-pattern and thereby hiding the metadata from the stream player. Client player apps can access the content in other ways to extract the ID3 metadata by looking specifically for its signature independently of the streaming process.
ID3 tags were originally designed for annotating tracks imported from music CDs. Typical and obvious tags are:
- Song title.
- Artist name.
- Album name.
- Track number.
There are many other tags described in the ID3 specification and more have been added as proprietary and de-facto extensions.
ID3 Versions
The ID3 metadata structures have evolved over several revisions. There are different ways for embedding the metadata in the file. New versions must be backwards compatible and not break earlier implementations. ID3 metadata tags conforming to version 1 are always placed at the tail end of the file. Version 2 tags are placed at the front. The tags are optional so they may not be present when you expect them.
The version 2.4 specification allows the tag metadata to be placed at the end of the file. It must precede the version 1 metadata to avoid breaking older players. Version 2.3 is the most popular kind of tagging and places the metadata only at the front of the file.
Although it is popular and widely supported for MP3 and many other media file types, the ID3 metadata version 2.3 is still an informal standard.
| Version | Disposition | Description |
|---|---|---|
| 1 | Obsolete | Fixed format suffix appended to the end of the file. Carries the title, artist, album, and a short comment. These are all limited to 30 characters. A year number is added and a value representing a genre from an indexed list. |
| 1.1 | Obsolete | Track numbers added by shortening the comment field. |
| 1.2 | Obsolete | Text fields increased in length and a sub-genre field added. Backwards compatibility with earlier versions was maintained but this version was never widely adopted. |
| 2 | Obsolete | The format and structure is completely revised. It is constructed from multiple frames that can each grow to 16MB within a total capacity of 256MB. This metadata is now placed at the front of the file so it is immediately available when streaming the MP3 content. Unicode compatible text strings. |
| 2.2 | Obsolete | Tag identifiers limited to three characters. |
| 2.3 | Most popular | Added album sleeve artwork images and disc numbers for boxed sets. Tag names are four characters. Added the disc number tag. |
| 2.3+ | Current | Chapter marks added with support for displaying synchronized slide show images. Very useful for podcasts. |
| 2.4 | Latest | Additional frame types and text frames can contain multiple NULL separated values. Tags can be stored at the start or end of the file. |
| 2.4+ | Latest | The same chapter mark support is added as per 2.3+. |
ID3 Tag Names
From version 2.3 onwards, tag names are described with four letters instead of three which were defined in the earlier versions. Tag translation is necessary when converting the metadata. Where the tags are localized for international use, an additional three letter ISO 639 county-code is added. A non-canonical list of country codes is also available on Wikipedia.
Version 2.3 facilitates image embedding for various purposes such as album cover artworks. The tag describes how the image is to be used. PNG is the optimal image type but JPEG and GIF are also supported.
Some players and metadata browsing systems may have difficulty in rendering a PNG file if it has an alpha channel to mask the image to a non-rectangular shape. You might do that to display an image of a scanned circular CD artwork or vinyl album disc.
Here is an informal (third-party) description of the version 2.3 standard which enumerates the tags and describes how they all work. This supplements the id3.org documentation:
https://mutagen-specs. readthedocs.io/en/latest/id3/id3v2.3.0.html
Support For Lyrics & Subtitles
Lyric tags were defined before ID3v2 and are always placed at the end of the file. They must be located prior to the ID3v1 tag metadata if it is included. The disadvantage is that an entire file needs to be delivered before the lyrics are available.
Work around this in web streaming by delivering separate VTT text tracks along with the audio stream. Then present them using timed text events.
Tagging AAC Files
The .aac files are simple binary containers carrying raw encoded audio in an ADTS elementary stream. These cannot be tagged with metadata without breaking the bitstream syntax. To add metadata, encapsulate the AAC elementary streams inside an .mp4 or .m4a file and then add the tags. The conversion to MPEG-4 files breaks the ADTS stream into segments, which allows the essence and metadata packets to be interleaved.
Do not perform this conversion simply by renaming the file to change the file extension! The internal content will not be changed and the file now has an incorrect extension to describe the content.
Convert an .aac file to an MPEG container with the ffmpeg command-line tool like this:
ffmpeg -i input.aac -c:a copy output.m4a
The audio is properly transcoded into segments without being uncompressed first. This avoids introducing additional lossy artefacts from a recompression cycle.
Then add the metadata tagging to the MPEG container file with the ffmpeg or ExifTool command-line utilities.
Two alternative ID3 metadata tagging dictionaries are in popular use:
- Generic (vanilla) ID3 metadata tags described by the de-facto specification.
- Proprietary Apple iTunes-extended ID3 metadata tags that enhance the generic specification.
The ISO 15938 (MPEG-7) standard describes an alternative tagging scheme for use with MPEG content. Where ID3 tags are simple name-value pairs, MPEG 7 is a bulkier XML structured format. ID3 is more popular.
Tagging AIFF files
Metadata is stored in the text chunks within an AIFF file. This is where ID3 tags can be carried. The recording name, author and a comment are already natively supported within the AIFF container specification. The bulk of the data is stored as individual chunks. Annotation and copyright text chunks are available.
Arbitrary text chunks can be constructed to carry ID3 metadata. This must be formatted according to the ID3v2 specification.
The Adobe Extensible Metadata Platform (XMP) can also be used with AIFF files. It has been standardized as ISO 16684-1 (published in 2012). The tools to support XMP are proprietary solutions available from Adobe.
An unofficial ‘ID3 ’ chunk (note the embedded space character) exists but is not mentioned in any AIFF specifications
Non-standard chunks may not survive an edit cycle if the application does not support them.
Tagging Is Important
Whatever technique you use, tagging your essence files is important. If there is no other metadata embedded in the file, at least store a unique identifier for that essence item. That can then be associated with the metadata records in a content management system.
These Appendix articles contain additional information you may find useful:
Supported by
You might also like...
Network Traffic Engineering: RIST & SRT - The Success Of ARQ Based Protocols
IP networks are inherently unreliable. We kick off this series on IP Network Traffic Engineering with a look at how RIST and SRT give broadcast engineers user-configurable control over the latency-versus-reliability trade-off for real-time media streaming.
Standards: Video - Standards For Video Coding
From 4K to 32K, the demand for ever-larger video formats is pushing codec technology to its limits. This guide surveys the landscape of video coding standards – from legacy MPEG formats to AI-driven neural network compression – to help navigate the choices sha…
Broadcast Standards 2026 – Video Coding
Video coding was developed to deliver video conferencing services over low-bandwidth modem connections, but modern demands for ever-larger video formats are pushing codec technology to its limits.
Network Traffic Engineering: Part 1
IP networks are inherently unreliable. They always have been – it is literally designed in as a feature.
Standards: An Introduction To Standards
There are many standards relevant to the broadcasting and media industry. In this section we examine the background to standards, who develops them, where to find them and why they are absolutely and totally necessary.