Standards: Audio - Important AES Standards
The AES standards library underpins professional audio engineering worldwide, from digital interconnects and synchronization to IP streaming and loudness. These are the standards that are the most relevant to broadcast and IP studio workflows; we examine what each one does and why it matters.
AES Standards Relevant To Broadcasting
There are nearly 60 different standards currently available from the AES with another 40 supporting documents that provide additional advice on how to deploy them.
AES strives to avoid the use of patented technologies or requires the patent holder to allow their use on a minimal or zero fee basis. The society also collaborates with other standards bodies such as the SMPTE, ISO, IEC, BSI and EBU.
The AES Standards are in common use all over the world, largely working unseen but nevertheless vitally important.
These standards are particularly relevant to the design of workflows in IP studios and ST 2110. The rest of the AES standards are also useful and informative as background knowledge:
| Standard | Description |
|---|---|
| AES3 | 2-channel digital audio. Used for digital audio interconnection and also known as AES/EBU. |
| AES5 | Preferred sample frequencies. |
| AES10 | Multichannel Audio Digital Interface (MADI). See separate document AES10id - Digital audio engineering guidelines. |
| AES11 | Synchronization of digital audio equipment in studio operations. |
| AES18 | Format for the user data channel of the AES digital audio interface. |
| AES31 | A file format for exchanging audio data between systems and applications. Described in multiple parts. Refer to Chapter 4-4 for a description of the AES31 containers. |
| AES50 | High-resolution multi-channel audio interconnection (HRMAI) delivered over Ethernet. See AES-R6 - Guidelines for AES standard for digital audio engineering. |
| AES52 | Describes how to insert unique identifiers into AES3 digital audio content. |
| AES57 | Audio object structures for preservation and restoration. |
| AES60 | AES standard for Core audio metadata. |
| AES67 | Interoperability of Audio over IP networks. |
| AES70 | Open Control Architecture. |
| AES77 | Loudness of streamed audio. Also described by EBU R128. |
AES3 - Serial Transmission For 2-channel Digital Audio
AES3 was originally published in 1985 and has been continually revised. In 2003, amendments to the standard were incorporated into the main body. In 2009, the 2003 edition was divided into four separate parts which are now published separately. The entire set was reaffirmed in 2019 being deemed up to date and not requiring changes at that time.
This standard describes how to transmit two channels of digital audio over a variety of different mediums. The supported audio format is linear Pulse Code Modulation (PCM) which is an uncompressed stream of samples. Sample sizes between 16 and 24 bits are supported. Other formats are possible but not described by AES3. See AES5 for the list of acceptable sample rates.
AES3 is composed of four parts describing Digital input-output interfacing. In particular the serial transmission format for two-channel linearly represented digital audio data.
| Document | Description |
|---|---|
| AES3-1 | Audio content semantics. Describing the sampling frequency based on AES5. |
| AES3-2 | Metadata & sub-code data are transmitted with the audio content such as channel-status, user & ancillary data. The use of pre-emphasis to enhance the audio is indicated in the channel status. |
| AES3-3 | Unidirectional transport link framing & channel co-ordination. This also embeds a recoverable clock signal. |
| AES3-4 | Physical & electrical signal levels & wiring. |
The use of abbreviations in audio/visual contexts is sometimes ambiguous and overloaded with hidden meaning. For example, when the interface is described as AES rather than AES/EBU, the means of electrical connection might be different.
This quotation from Ray Arthur Rayburn – a highly respected audio engineer in the AES community – explains why:
“AES3 allows the use of transformer or transformerless interfaces, while the corresponding EBU standard requires the use of transformers. Therefore, it has become a common shorthand to say AES/EBU when the interface is transformer coupled, and AES3 when it is not or if the interface type is unknown.”
AES/EBU is described in the third edition of the EBU Tech 3250 document.
Also see document AES2id which provides guidelines for the use of the AES3 interface. The fourth edition of AES2id was published in 2020. This document provides important insights when applying AES3. Make sure you are using the most recent edition because the 2006 edition refers to the single part 2003 version of AES3. The latest version has been revised to refer to the current four part version. These standards documents are relevant to the use of AES3:
| Document | Description |
|---|---|
| AES preprint 3783 | Twisted-pair cables for AES/EBU Digital Audio Signals. A technical paper presented at the 96th AES Convention in Amsterdam. |
| AES-2id | Guidelines for the use of the AES3 interface. |
| AES5 | Professional Digital Audio Applications Employing Pulse Code Modulation-Preferred Sampling Frequencies. |
| AES10 | Serial Multichannel Audio Digital Interface (MADI). |
| AES11 | Synchronization of digital audio equipment in studio operations. |
| AES18 | Format for the user data channel of the AES digital audio interface. |
| AES26 | Conservation of the polarity of audio signals. |
| AES47 | Digital input-output interfacing - Transmission of digital audio over asynchronous transfer mode (ATM) networks. |
| AES52 | Insertion of unique identifiers into the AES3 transport stream. |
| AES55 | Carriage of MPEG Surround in an AES3 bitstream. |
| EBU T3250 | Specification of the digital audio interface (also known as the AES/ EBU interface). |
| EBU TR R68 | Alignment level in digital audio production equipment & in digital audio recorders. |
| IEC 60169-8 | RF coaxial connectors diameter of outer conductor 6.5mm with BNC lock. |
| IEC 60268-12 | Application of connectors for broadcast & similar use. |
| IEC 60603-7 | Detailed specification for 8-way connectors. |
| IEC 60958-1 | Serial, uni-directional, self-clocking interface for the interconnection of digital audio equipment. |
| IEC 60958-3 | Digital audio interface for consumer applications. |
| IEC 60958-4 | Digital audio interface for professional applications. |
| ISO 646 | ISO 7-bit coded character set for information interchange. |
| ISO/IEC 11801 | Generic cabling for customer premises. |
| ISO/IEC 23003-1 | MPEG Surround sound. |
| ITU-R BS.450-3 | Transmission standards for FM sound broadcasting at VHF. |
| ITU-R BS.647 | A digital audio interface for broadcasting studios. |
| ITU-T J.17 | Pre-emphasis used on sound program circuits. |
| ITU-T V.11 | Electrical characteristics for balanced double-current interchange circuits operating at data signaling rates up to 10 Mbps. |
| RFC 4122 | A Universally Unique IDentifier (UUID) URN Namespace. |
| RFC 9562 | Universally Unique IDentifiers (UUIDs). |
| RP 155 | SMPTE Recommended practice for the reference level in digital audio systems. |
| ST 276 | Transmission of AES/EBU Digital Audio Signals Over Coaxial Cable. |
| ST 297 | Serial Digital Fiber Transmission System for ANSI/SMPTE 259M Signals. |
| ST 337 | Format for Non-PCM Audio & Data in an AES3 Serial Digital Audio Interface. |
| ST 338 | Format for Non-PCM Audio & Data in AES3 - Data Types. |
| ST 339 | Format for Non-PCM Audio & Data in AES3 - Generic Data Types. |
| ST 340 | Format for Non-PCM Audio & Data in AES3 - ATSC A/52B Digital Audio Compression Standard for AC-3 & Enhanced AC-3 Data Types. |
Refer to ST 2110-31 to see how AES3 is applied to a modern IP studio architecture.
AES5 - Preferred Sample Frequencies
This standard describes various sample rates and recommends 48 kHz at the outset because it is numerically easier to convert this to other sample rates. See Section 4.2 of the standard for an explanation. Sample rates at 96 and 44.1 kHz are also described.
There is an interesting paragraph on bandwidth (see Section 4.1) based on the Nyquist-Shannon sampling theorem.
Derived sample rates ranging from half to eight times the basic sample rate are also described. There are tables listing the number of samples per frame of video at different frames per second rates vs. audio sample rates.
This is an important foundational standard referred to by AES3 and AES67 and other related documents.
AES10 - Multichannel Audio Digital Interface (MADI)
The latest edition of AES10 and its supporting AES10id information document were both published in 2020.
Refer to the digital audio engineering guidelines described in the supplementary AES10id document. This provides additional insights into how to apply the AES10 standard. The transmission data format is explained in more detail which is very helpful when developing MADI compatible interfaces.
If you intend to use this standard to design a product, there may be patent license fees to pay.
These standards documents are relevant to the use of AES10:
| Document | Description |
|---|---|
| AES3 | Serial transmission format for two-channel linearly represented digital audio data. |
| AES11 | Synchronization of digital audio equipment in studio operations. |
| AES47 | Transmission of digital audio over asynchronous transfer mode (ATM) networks. |
| EN 50083-9 | Interfaces for CATV/SMATV head ends and similar professional equipment for DVB/MPEG-2 transport streams. |
| IEC 60169-8 | Radio-frequency coaxial connectors with inner diameter of outer conductor 6.5 mm (0.256 in) with BNC lock. |
| ISO 9314-1 | Fiber Distributed Data Interface (FDDI) Token Ring Physical Layer Protocol (PHY). |
| ISO 9314-3 | Fiber Distributed Data Interface (FDDI) Physical Layer Medium Dependent (PMD). |
| SMPTE 297 | Serial digital fiber transmission system for ANSI/SMPTE 259M signals for television. |
| SMPTE 320 | Channel Assignments and Levels on Multichannel Audio Media for television. |
| SMPTE 323 | Channel Assignments and Levels on Multichannel Audio Media for motion picture film. |
What Is 4B5B Encoding?
AES10 describes a bitstream format for use with MADI based on sending 4-bit frames in a 5-bit wrapper. It operates at the physical connection layer in the OSI network taxonomy. This has certain advantages for signal quality and ancillary control messages.
The 4B5B technique is a very neat solution to provide synchronization of the electro-optical signals arriving at network receivers. This operates at the physical layer in the OSI network model. This is the very lowest foundational layer that describes the electrical (or optical) connection between end-points.
This 4B5B encoding technique is used on optical fiber (FDDI), Ethernet and USB interfaces. It converts each frame of 4-bits into a 5-bit frame. There is a small disadvantage that this increases the amount of data being transmitted by 25%. This allows some of the additional values to describe ancillary control signals.
The resulting electrical or optical level is designed to guarantee there is always at least one transition between the 0 and 1 state somewhere within the 5-bit frame. The only exception to this is the loss of signal symbol (00000) and the idle symbol (11111). By careful detection of the 5-bit values being sent, the information transmitted is easily synchronized by re-framing at the receiver. There are also electrical noise immunity benefits from adopting this approach.
Nomenclature
This technique is sometimes described as 4B5B or 5B4B when operating in reverse. Search for both variants when looking for information. The 5B4B terminology describes data conversion from 5-bit frames to 4-bit frames. Occasionally it is used instead of 4B5B but that might be a typographical error.
A similar technique called 8B10B is described in the MADI documentation for 8-bit framing. This is far less efficient than 4B5B.
Converting From 4-bit To 5-bit
Here is the conversion between 4-bit framed data values and 5-bit frames shown as a table. Only 16 of the 32 values possible with 5-bits are needed to carry the 4-bit data:
| Hexadecimal | 4-bit Value | 4B5B Code |
|---|---|---|
| 0 | 0000 | 11110 |
| 1 | 0001 | 01001 |
| 2 | 0010 | 10100 |
| 3 | 0011 | 10101 |
| 4 | 0100 | 01010 |
| 5 | 0101 | 01011 |
| 6 | 0110 | 01110 |
| 7 | 0111 | 01111 |
| 8 | 1000 | 10010 |
| 9 | 1001 | 10011 |
| A | 1010 | 10110 |
| B | 1011 | 10111 |
| C | 1100 | 11010 |
| D | 1101 | 11011 |
| E | 1110 | 11100 |
| F | 1111 | 11101 |
Ancillary Symbols
Some of the other 16 values have special meaning. They are called symbols and use letters above the hexadecimal range. The symbols are somewhat mnemonic to aid in understanding their meaning:
| Name | Symbol | 4B5B Code |
|---|---|---|
| Halt | H | 00100 |
| Idle | I | 11111 |
| Start #1 | J | 11000 |
| Start #2 | K | 10001 |
| Start #3 | L | 00110 |
| Quiet (loss of signal) | Q | 00000 |
| Reset | R | 00111 |
| Set | S | 11001 |
| End (terminate) | t | 01101 |
Control Codes
The mnemonic symbols are used in various combinations to send commands ‘down the wire’. Several are used on their own, a few are used in pairs. Groups of four symbols are used on USB interfaces:
| Command | Symbols | 5-bit Frame Sequence |
|---|---|---|
| Sync, Start delimiter | JK | 11000 10001 |
| 100BASE-X idle marker | I | 11111 |
| USB-PD end delimiter | T | 01101 |
| FDDI end delimiter | TT | 01101 01101 |
| Not used (terminate - set) | TS | 01101 11001 |
| SAL (idle - halt) | IH | 11111 00100 |
| 100BASE-X end delimiter | TR | 01101 00111 |
| Not used (set-reset) | SR | 11001 00111 |
| Not used (set-set) | SS | 11001 11001 |
| 100BASE-X transmit error | H | 00100 |
| USB-PD Start Of Packet (SOP) | JJJK | 11000 11000 11000 10001 |
| USB-PD SOP′ | JJLL | 11000 11000 00110 00110 |
| USB-PD SOP″ | JLJL | 11000 00110 11000 00110 |
| USB-PD SOP′_Debug | JSSL | 11000 11001 11001 00110 |
| USB-PD SOP″_Debug | JSLK | 11000 11001 00110 10001 |
| USB-PD Hard Reset | RRRS | 00111 00111 00111 11001 |
| USB-PD Cable Reset | RJRL | 00111 11000 00111 00110 |
AES10 mandates that a JK Sync, Start delimiter signal should be periodically inserted into the MADI bitstream.
Unused Values
A few of the available 5-bit frame values are unused. If they are observed at the receiving end, it indicates a problem and helps to detect errors in the transmission:
- 00001
- 00010
- 00011
- 00101
- 01000
- 01100
- 10000
Supporting Documents
Refer to these documents for more information:
| Document | Description |
|---|---|
| AES10 - Section 4.3 | Transmission format. |
| AES10 - Annex A | Example of link encoding. |
| AES10 - Annex B | Use of 4B5B sync symbols for channel-independent data. |
| AES10id - Section 8 | MADI Transport stream. |
| ISO 9314-1 | Token Ring Physical Layer Protocol (PHY). |
| Wikipedia 4B5B | Mostly derived from AES10. |
The AES10 standard directs you to the AES web sites for details of the 4B5B line code for transmitting data on fiber optic or Ethernet connections. That document is missing but there is a very good explanation of 4B5B signaling on Wikipedia. Read that in conjunction with Annex B of the AES10 standard.
AES11 - Synchronization Of Digital Audio Equipment
Multiple channels of audio must be carefully synchronized. The sample clocks governing when the source audio is captured must be accurately regulated. Any downstream processing needs to maintain the phase relationships between channels to avoid introducing unwanted audible artefacts. This is a complex topic and there are many solutions.
Equipment using an internal sample clock must be locked to an external source. AES11 describes this as a Digital Audio Reference Signal (DARS) which is delivered separately from the audio content (usually via a separate connection). AES5 describes multiples of up to eight times the basic sample rate. The internal sample clock must be capable of reliably locking to all of these.
Alternative synchronization techniques can be used instead of DARS:
- Embedded time signatures based on the packet header timestamps. This may drift out of sync with other streams.
- Video reference syncing to frame-start events.
- GPS locking. This requires a separate receiver device and locks to real-world time.
AES11 describes the word clock (see Annex B). This synchronizes hardware devices (such as digital tape machines or CD players). The word clock governs the timing of each sample passing through the system and is derived from a centralized reference. This will be familiar to broadcast engineers who ensure that video across an enterprise is frame synchronous by distributing sync pulses from a reliable source.
The word-clock is not the same as timecode. The word-clock is integral to the sampling process and transmission of the digital audio where the timecode is a separate metadata service that describes the media being transmitted. They operate in different time domains.
AES11 refers to AES5 and augments the sample rate descriptions with advice pertaining to video reference timing.
AES11 is relevant when studying PTP and other timing protocols. In particular when designing products that need to work with ST 2110 based IP studios. The latest version of AES11 was published in 2020.
These standards documents are relevant to the use of AES11:
| Document | Description |
|---|---|
| AES3 | Serial transmission format for linearly represented digital audio data. |
| AES5 | Professional digital audio applications employing pulse code modulation - Preferred Sampling Frequencies. |
| AES47 | Transmission of digital audio over asynchronous transfer mode (ATM) networks. |
| ST 318 | Synchronization of 59.94 or 50 Hertz related video and audio systems in analog and digital areas. |
| RP168 | Definition of Vertical Interval Switching Point for Synchronous Video Switching. |
| WHP 074 | BBC R&D White Paper about the development of ATM network technology for live production infrastructure. |
AES18 - Ancillary User Data Channel Format
Ancillary user specified metadata can be embedded within an AES3 audio stream. Messages can be any length. The only limitation is the maximum bitrate which caps the amount of data that can be inserted in addition to the audio payload. A long message could describe the entire asset with an abstract for display in an EPG. Shorter messages provide synchronous data such as:
- Subtitle text.
- Script cues.
- Editing information.
- Copyright assertions.
- Performer credits.
- Downstream switching instructions.
This is managed carefully to avoid delaying the audio content. Messages can be split and portions deferred to accommodate the bitrate capping limit.
Ancillary data adapts the High-level Data Link Control (HDLC) protocol originally defined in ISO 3309 (as defined in AES18). That standard has now been withdrawn and replaced by ISO 132239. HDLC is bi-directional, but in the context of AES3, the messages only travel one way with no handshaking.
Error resilience helps detect data corruption at the receiver. If necessary, important data could be delivered in a carousel-like structure and repeated periodically.
The standard lists many external references in the Annex C Bibliography. These date from the mid 1980’s to the 1990’s and cover radio text services which are now deployed worldwide.
Because of the vintage, the specified character sets do not yet use Unicode. Text is constrained to 8-bit character codes as defined in ISO 4873. UTF-8 character encoding of Unicode text is briefly mentioned in the AES67 standard.
AES50 - High-resolution Multi-channel Interconnection
AES50 interconnections are sometimes abbreviated to HRMAI. HRMAI is intended to be used in a point-to-point fashion rather than transmitting data over a network. Having just a sender and receiver enables it to provide the following advantages:
- Supports many commonly-used digital audio coding formats, including “high-resolution” formats such as high sample-rate linear PCM, and one-bit delta-sigma modulated formats.
- Low and predictable latency less than 100 microseconds.
- Able to use CAT-5 data cable which is generally cheaper than CAT-6 or CAT-7.
- Interconnections span distances up to 100 meters.
- High-quality full-duplex clocks are transmitted in parallel with the audio data.
- Full-duplex audio interconnection allows traffic to move in both directions at the same time.
- 5 Mbit/sec full-duplex auxiliary data connection, compatible with Ethernet networks. This is in parallel with the point-to-point interconnect for audio essence.
Project report AES-R6 provides additional guidelines for deploying the AES50 standard for HRMAI connections. Read both AES50 and AES-R6 documents to gain a better understanding of HRMAI. The latest editions of both documents were published in 2020.
These standards documents are relevant to the use of AES50:
| Document | Description |
|---|---|
| AES3 | Serial transmission format for two- channel linearly represented digital audio data, Parts 1 to 4. |
| ANSI X3.263 | Fiber Distributed Data Interface (FDDI) - Token Ring Twisted Pair Physical Layer Medium Dependent (TP-PMD). |
| IEEE 802.1Q | Virtual Bridged Local Area Networks. |
| IEEE 802.3 | Carrier sense multiple access with collision detection (CSMA/CD) access method and physical layer specifications. |
| ISO 8802-3 | ISO re-published version of IEEE 802.3. |
| TIA/EIA-568-B.2 | Balanced twisted-pair cabling components. |
AES52 - Inserting Unique Identifiers Into AES3
Inserting unique identifiers using the 128-bit UUID standard is helpful for identifying the stream content at the receiving end. This also helps to connect the stream to metadata that may be stored in another system. The UUID message is transmitted periodically so that clients joining the stream part way through can still acquire the value.
These standards are also relevant:
| Standard | Vintage | Description |
|---|---|---|
| ISO 9834-8 | 2014 | Generation of universally unique identifiers (UUIDs) and their use in object identifiers. |
| ISO 11578 | 1996 | Remote Procedure Call (RPC). |
| ITU-T Rec. X.667 | 2012 | OSI networking and system aspects - Naming, Addressing and Registration. |
| RFC 4122 | 2005 | A Universally Unique IDentifier (UUID) URN Namespace. Obsoleted by RFC 9562. |
| RFC 9562 | 2024 | Universally Unique IDentifiers (UUIDs). |
The related Unique Material Number described by SMPTE ST 330 is less unique as it has fewer bits but it can use a UUID with some special formatting to implement the UMID.
AES57 - Audio Object Structures
AES57 describes the attributes for Audio Objects. These are carriers for audio essence realized as discrete samples that are packaged as multi-channel frames and presented on a timeline. These attributes are reflected as properties when the objects are instantiated in an Object Oriented programming environment.
Annex A describes an XML schema that can be used to represent audio objects.
Understanding the vocabulary used to describe the audio objects informs the design of your metadata model which facilitates the building of a reliable content management system and workflow process supervisor tool.
AES60 - AES Standard For Core Audio Metadata
This standard is described in the Information document identified as AES60id. The latest edition was published in 2020.
The AESCore metadata schema is consistent with the EBUCore schema published as EBU Tech3293. Both of these are extensions of the original DublinCore metadata schema.
EBUCore is the minimum set of attributes needed to describe video and audio media resources.
XML is used as the support and tools are widely available for creating, editing and harvesting metadata delivered in this format.
AES67 - High-Performance Streaming Audio
The original intent for AES67 was to deliver professional quality audio over a high-performance IP network with less than 10ms latency. Bridging diverse pre-existing audio networking systems to provide interoperability was also a core goal. This is suitable for sound reinforcement at live events.
High performance is feasible on existing local area networks (LAN). If suitable switching hardware is available, it can be supported widely across an enterprise.
These are the main features:
- Based on existing and well-known IT standards described in IETF RFC documents.
- Synchronization with boundary clock converters.
- Streaming transport via RTP.
- Session description with SDP.
- Low-latency delivery of uncompressed audio.
- Ideal for live, studio and broadcast situations.
- Decentralized configuration and management of devices.
- Coexists with other IT data traffic on the same network.
Prior to AES67, the available audio networking solutions were incompatible with one another. AES67 is designed to reconcile the needs of architectures designed by different manufacturers and facilitates interoperability between:
- Dante
- Ravenna
- QLAN
- WheatNet-IP
- Livewire
These topics are addressed by the standard:
Transport Synchronization - A variety of techniques are discussed in Section 4 of the standard.
Media Profiles - Standard IP networks must adhere to a media profile (see Annex A) to ensure timely delivery of packets.
Boundary Clock Converters - Networks using switching hardware that supports IEE PTP protocols can provide boundary clock conversion and should provide adequate performance for audio delivery.
AVB - Enhanced Ethernet Networks that conform to IEEE 802.1Q are described as Audio Video Bridging (AVB) and provide synchronization based on IEEE PTP. This is covered in Annexes C and D.
Media Clocks - These are described in Section 5 and provide synchronization at the sample level. A media clock advances in sync with the sample rate. The same frequency should be used for the RTP clock.
Payload Encoding - This is described in Section 7, which reiterates the limited range of three preferred sample rates from AES5 with two possible sample sizes. Packet sizes are determined primarily by how long the data in them would play for the given sample rate. AES67 describes these sample rates (derived from AES5):
- 48 kHz
- 96 kHz
- 44.1 kHz
The standardized sample sizes and formats are defined in great detail in these IETF RFC documents:
- L16 - 16-bit linear format as defined in RFC 3551 clause 4.5.11.
- L24 - 24-bit linear format as defined in RFC 3190 clause 4.
Channel Count - Up to 120 channels of audio can be carried in a generic AES67 link. ST 2110-30 limits the number of channels depending on the conformance level of the receiving device. This may be as low as four channels at level AX and not more than 64 for level C.
SDP - Session Description Protocol provides discovery and connection management support. This includes keep alive heartbeats to maintain connections. The discovery systems are described in Annex E. These include the AMWA NMOS IS-04 specification used by ST 2110.
IETF RFC References - Because this is a standard describing IP network transmission, there are many RFC documents cited in the normative references in Section 2 of the standard and more references are included in the bibliography in Annex H. Using the IETF specifications ensures compatibility with the rest of the IP network traffic.
Networked audio conforming to AES67 is used in ST 2110 installations and covered by ST 2110-30. Additional supporting documentation is available in these AES project reports:
| Document | Description |
|---|---|
| AES-R12 | AES67 Interoperability PlugFest - Munich 2014. |
| AES-R15 | AES67 Interoperability PlugFest - Washington 2015. |
| AES-R16 | PTP parameters for AES67 and SMPTE ST 2059-2 interoperability. |
| AES-R17 | AES67 Interoperability PlugFest - London 2017. |
| AES-R19 | AES67 Protocol Implementation Conformance Statement (PICS) Summary. |
| AES-R20 | AES67 beyond the LAN. |
If you intend to use this standard to design a product, there may be patent license fees to pay.
These standards documents are relevant to the use of AES67:
| Document | Description |
|---|---|
| AES5 | Preferred sampling frequencies for applications employing pulse-code modulation. |
| AES11 | Synchronization of digital audio equipment in studio operations. |
| AES67 | High-performance streaming audio-over-IP interoperability. |
| AES-R16 | PTP parameters for AES67 and SMPTE ST 2059-2 interoperability. |
| EBU Tech 3326 | Audio contribution over IP - Requirements for Interoperability. |
| IEEE 1588 | IEEE Standard for a Precision Clock Synchronization Protocol for Networked Measurement and Control Systems. |
| IEEE 802.1AS | Timing and Synchronization for Time-Sensitive Applications in Bridged Local Area Networks. |
| IEEE 802.1BA | Audio Video Bridging (AVB) Systems. |
| IEEE 802.1Q | Media Access Control (MAC) Bridges and Virtual Bridged Local Area Networks. |
| IS-04 | AMWA NMOS Discovery & Registration. |
| ISPCS paper | Using an IEEE 802.1AS Network as a Distributed IEEE 1588 Boundary, Ordinary, or Transparent Clock. Presented at the IEEE-ISPCS conference 2010. |
| RFC 768 | User Datagram Protocol. |
| RFC 791 | Internet Protocol. |
| RFC 792 | Internet Control Message Protocol. |
| RFC 894 | A Standard for the Transmission of IP Datagrams over Ethernet Networks. |
| RFC 1112 | Internet Group Management Protocol, Version 2. |
| RFC 2474 | Definition of the Differentiated Services Field (DS Field) in the IPv4 and IPv6 Headers. |
| RFC 2597 | Assured Forwarding PHB Group. |
| RFC 2616 | Hypertext Transfer Protocol - HTTP/1.1 RFC 2974 - Session Announcement Protocol. |
| RFC 3170 | IP Multicast Applications Challenges and Solutions. |
| RFC 3190 | RTP Payload Format for 12-bit DAT Audio and 20- and 24-bit Linear Sampled Audio. |
| RFC 3261 | SIP - Session Initiation Protocol. |
| RFC 3264 | An Offer/Answer Model with the Session Description Protocol (SDP). |
| RFC 3376 | Internet Group Management Protocol, Version 3. |
| RFC 3550 | RTPA Transport Protocol for Real-Time Applications. |
| RFC 3551 | RTP Profile for Audio and Video Conferences with Minimal Control. |
| RFC 4028 | Session Timers in the Session Initiation Protocol (SIP). |
| RFC 4566 | Session Description Protocol. |
| RFC 5939 | Session Description Protocol (SDP) Capability Negotiation. |
| RFC 6762 | Multicast DNS. |
| RFC 6763 | DNS-Based Service Discovery. |
| RFC 7272 | Inter-Destination Media Synchronization (IDMS) Using the RTP Control Protocol (RTCP). |
| RFC 7273 | RTP Clock Source Signaling. |
| IETF draft | Using OPTIONS to Query for Operational Status in the Session Initiation Protocol (SIP). |
| IETF draft | SIP URI Service Discovery using DNS-SD. |
AES70 - Open Control Architecture
The Open Control Architecture (OCA) was the foundation for AES70. AES70 is now the formal standard for OCA. It describes a scalable control-protocol for managing media devices over an IP network. This is quite separate to managing streaming services although it needs to take account of such traffic.
The emergence of SMPTE ST2138 develops the same concepts.
AES70 is designed around an object oriented approach to coding. It uses the HTTP accessor methods to GET or SET various properties on a target device. Changes to a property are notified with an event that triggers a handler of some kind. This is described in the AES70 Class Structure.
The goal of AES70 is to provide full-function device control and monitoring for this range of situations:
- Professional applications.
- Multi-vendor systems.
- Mission-critical or noncritical applications.
- Media networking applications of all sizes from two to 10,000 nodes or more.
- Secure or insecure implementations.
- Multiple-controller systems.
- Peer to peer systems devoid of separate controllers.
- Audio devices are targeted now.
- Multiple connection methods are supported.
- Video devices will be targeted in future.
- Other related equipment may be scoped into AES70 as a long-term goal.
- Devices of all sizes - wall panel to mixing desk, possibly with tiny processors.
- Dynamically-reconfigurable devices.
- Products with proprietary features.
- Able to work on low and high bandwidth networks.
AES70 can operate on any IP network and uses these connection methods to reach target devices:
- WebSockets
- JSON
- TCP
- UDP
Audio devices can be controlled using adapters for these protocols. Some presentations on AES70 describe other alternatives as well:
- Dante
- Ravenna
- Milan
- AES67
Video devices are expected to be supported via adapters for these protocols. Others will be introduced in due course:
- SDVoE
- ST 2110
AES70 specifies several different protocols, not all of which have been publicly released. Currently only OCP.1 has been defined for use on TCP/IP networks. When other protocols are defined, they will all be based on the same core object model.
There are several published parts of the AES70 standard with others nearing publication. The numbering of these parts suggests there are many more to come. The descriptions of those parts is not yet publicly known outside of the AES organization. At the time of writing, this is what we know so far. The information is collated from presentations at conferences and published documents:
Parts 1, 2 and 3 describe Core functionality. Parts 21, 22 and 23 are adapters for various proprietary protocols. These work rather like software drivers. It is helpful to read the first three parts together.
If you intend to use this standard to design a product, the standard warns that there may be patent license fees to pay. However the OCA organization states that the protocols are supposed to be license free. Any patents may only affect some of the management adapters used for proprietary hardware.
AES70 is developed in collaboration with the Alliance for IP Media Solutions (AIMS) and the Open Control Architecture Alliance (OCA). Find out more about the Open Control Architecture Alliance here:
https://ocaalliance.com/
| Document | Vintage | Description |
|---|---|---|
| AES70-1 | 2018 | OCA - Core framework describing the models and mechanisms. |
| AES70-2 | 2018 | OCA - Core control class structure describing the functional capabilities. |
| AES70-3 | 2018 | OCA - Core OCP.1 Binary communications protocol for IP networks. |
| AES70-4 | Draft | OCA - JSON protocol. |
| AES70-21 | Draft | AES67 and SMPTE ST 2110 connection Management adapter for controlling streaming connections. |
| AES70-22 | 2024 | Milan (AVB) media transport connection Management adapter for controlling streaming connections. |
| AES70-23 | Draft | Dante connection management adapter. |
These standards documents are relevant to the use of AES70:
| Document | Description |
|---|---|
| AES17 | Measurement of digital audio equipment. |
| Avnu-Milan | The "Milan Specification", published by the Avnu Pro Audio Technical Workgroup. |
| IEEE 1722 | AVTP - IEEE Standard for Layer 2 Transport Protocol for Time-Sensitive Applications in Bridged Local Area Networks. |
| IEEE 1722.1 | ATDECC - IEEE Standard for Device Discovery, Connection Management, and Control Protocol for Time-Sensitive Networking System. |
| IEEE 754 | IEEE Standard for Binary Floating-Point Arithmetic. |
| IEEE-1588 | Precision Clock Synchronization Protocol (PTP) for Networked Measurement and Control Systems. |
| IEEE-802.1AS | Local and Metropolitan Area Networks - Timing and Synchronization for Time-Sensitive Applications in Bridged Local Area Networks. |
| IEEE-802.1BA | Audio Video Bridging (AVB) Systems. |
| IEEE-802.1Q | Media Access Control (MAC) Bridges and Virtual Bridge Local Area Networks. |
| IS-12 | AMWA NMOS Control Protocol. |
| ISO 9787 | Robots and robotic devices - Coordinate systems and motion nomenclatures. |
| ISO 10646-1 | Universal Multiple-Octet Coded Character Set (UCS) - Part 1 - Architecture and basic multilingual plane. |
| ISO 19503 | XML Metadata Interchange (XMI). |
| ITU-R BS.2076-1 | Audio Definition Model. |
| ITU-R BS.2076.1 | Audio Definition Model - Chapter 8, Coordinate System. |
| NIMA TR8350.2 | US Department of Defense World Geodetic System. |
| RFC 3927 | Dynamic Configuration of IPv4 Link-Local Addresses. |
| RFC 4279 | Pre-Shared Key Cipher-suites for Transport Layer Security (TLS). |
| RFC 4862 | IPv6 Stateless Address Auto-configuration. |
| RFC 5246 | The Transport Layer Security (TLS) Protocol. |
| RFC 6335 | Internet Assigned Numbers Authority (IANA) Procedures for the Management of the Service Name and Transport Protocol Port Number Registry. |
| RFC 6455 | The WebSocket Protocol. |
| RFC 6762 | Multicast DNS. |
| RFC 6763 | DNS-Based Service Discovery. |
| RFC 7231 | Hypertext Transfer Protocol (HTTP/1.1) Semantics and Content. |
| RFC 7235 | Hypertext Transfer Protocol (HTTP/1.1) Authentication. |
| ST 2059-2 | SMPTE Profile for Use of IEEE-1588 Precision Time Protocol in Professional Broadcast Applications. |
Whilst AES70 is a more mature technology, SMPTE ST2138 provides a wider scope of control and will emerge as a competing solution.
The IABM has published an interesting comparison table that maps various attributes of AES 70 against other control plane architectures (such as ST 2138, Ember+ and NMOS). Search online for this document title to locate a copy for downloading:
“IABM-Control-Plane-Comparison-AES”
AES77 - Loudness Of Streamed Audio
Setting the loudness of streamed audio has been an important topic for some time. Content providers have a duty to protect the consumers and avoid hearing damage. There has been a great deal of work on this topic by various organizations. AES provides a page of collected background information about the Loudness project at their web site:
https://aes2.org/audio-topics/loudness-2/
The references and resources page has links to many other useful downloadable documents from a variety of other organizations:
https://aes2.org/resources/audio-topics/loudness-project/resources-and-references/
This is a very useful collection of technical papers on loudness measurements and calibrations, and has links to a large number of relevant documents from collaborating organizations that have worked on this with the AES.
AES77 is based on the AES technical document TD1008.1.21-9 which is accessible from the resources page and contains the same fundamental knowledge. Other resources such as EBU R 128 are also helpful.
AES77 is currently being worked on by task group SC-02-12-Q. A newer version is likely to be published when they have completed their revisions.
These standards documents are relevant to the use of AES77. These and other useful supporting white papers and conference proceedings are linked from the AES Loudness project resources page:
| Document | Description |
|---|---|
| AES71 | Recommended Practice Loudness Guidelines for Over-the-Top Television and Online Video Distribution. Based on AES TD 1006. |
| AES TD1005 | Audio Guidelines for Over the Top Television and Video Streaming. |
| AES TD1006 | Loudness Guidelines for OTT and OVD Content. |
| AES TD1008 | Recommendations for Loudness of Internet Audio Streaming and On-Demand Distribution. |
| ANSI/CTA-2075 | Loudness Standard for Over-the-Top Television and Online Video Distribution for Mobile and Fixed Devices. |
| ATSC A/85 | ATSC Recommended Practice Techniques for Establishing and Maintaining Audio Loudness for Digital Television. |
| CENELEC - EN 50332-3 | Sound system equipment headphones and earphones associated with personal music players - Maximum sound pressure level measurement methodology - Part 3. |
| EBU - Tech 3341 | Loudness Metering. ‘EBU Mode’ Metering to supplement loudness normalization in accordance with EBU R 128. |
| EBU - Tech 3342 | Loudness Range - A measure to supplement loudness normalization in accordance with EBU R 128. |
| EBU - Tech 3343 | Practical guidelines for Production and Implementation in accordance with EBU R 128. |
| EBU - Tech 3344 | Practical guidelines for distribution systems in accordance with EBU R 128. |
| EBU R 128 | Loudness Normalization and Permitted Maximum Level of Audio Signals. |
| EBU R 128 S1 | Loudness Parameters for Short-Form Content. |
| EBU R 128 S2 | Loudness in Streaming. |
| ITU-R BS.1770-5 | Algorithms to measure audio program loudness and true-peak audio level. |
| ITU-R BS.1771-1 | Guidelines for audio loudness of online video content, prepared by the AES Audio Guidelines for Over the Top Television and Video Streaming (AGOTTVS) technical group. |
| ITU-T H.870 | E-health multimedia systems, services and applications - Safe listening. |
Older AES Standards
Earlier AES standards are based on Asynchronous Transfer Mode (ATM) networks. An ATM network can carry voice and data simultaneously. Ethernet can only carry data but Voice over IP (VoIP) supports telephony applications as well.
AES47 & AES51 describe how to transmit audio over ATM networks.
AES Member Benefits
Being aware of everything the AES offers will enhance your skill-set when dealing with audio matters.
The AES document collection is a foundational source of reference. The standards are backed up by information documents that help you apply them and other technical documents that describe useful background and supplemental material.
Some of the newer standards such as AES70 will have a profound impact and facilitate the implementation of software defined workflows and studios. AES70 bridges between the software and hardware worlds in a very elegant way.
The benefits of joining the AES as a member far outweigh the cost of subscribing. I cannot recommend this highly enough.
These Appendix articles contain additional information you may find useful:
Supported by
You might also like...
Network Traffic Engineering: RIST & SRT - The Success Of ARQ Based Protocols
IP networks are inherently unreliable. We kick off this series on IP Network Traffic Engineering with a look at how RIST and SRT give broadcast engineers user-configurable control over the latency-versus-reliability trade-off for real-time media streaming.
Standards: Video - Standards For Video Coding
From 4K to 32K, the demand for ever-larger video formats is pushing codec technology to its limits. This guide surveys the landscape of video coding standards – from legacy MPEG formats to AI-driven neural network compression – to help navigate the choices sha…
Broadcast Standards 2026 – Video Coding
Video coding was developed to deliver video conferencing services over low-bandwidth modem connections, but modern demands for ever-larger video formats are pushing codec technology to its limits.
Network Traffic Engineering: Part 1
IP networks are inherently unreliable. They always have been – it is literally designed in as a feature.
Standards: An Introduction To Standards
There are many standards relevant to the broadcasting and media industry. In this section we examine the background to standards, who develops them, where to find them and why they are absolutely and totally necessary.