Standards: Audio - Important AES Standards

The AES standards library underpins professional audio engineering worldwide, from digital interconnects and synchronization to IP streaming and loudness. These are the standards that are the most relevant to broadcast and IP studio workflows; we examine what each one does and why it matters.

AES Standards Relevant To Broadcasting

There are nearly 60 different standards currently available from the AES with another 40 supporting documents that provide additional advice on how to deploy them.

AES strives to avoid the use of patented technologies or requires the patent holder to allow their use on a minimal or zero fee basis. The society also collaborates with other standards bodies such as the SMPTE, ISO, IEC, BSI and EBU.

The AES Standards are in common use all over the world, largely working unseen but nevertheless vitally important.

These standards are particularly relevant to the design of workflows in IP studios and ST 2110. The rest of the AES standards are also useful and informative as background knowledge:

StandardDescription
AES32-channel digital audio. Used for digital audio interconnection and also known as AES/EBU.
AES5Preferred sample frequencies.
AES10Multichannel Audio Digital Interface (MADI). See separate document AES10id - Digital audio engineering guidelines.
AES11Synchronization of digital audio equipment in studio operations.
AES18Format for the user data channel of the AES digital audio interface.
AES31A file format for exchanging audio data between systems and applications. Described in multiple parts. Refer to Chapter 4-4 for a description of the AES31 containers.
AES50High-resolution multi-channel audio interconnection (HRMAI) delivered over Ethernet. See AES-R6 - Guidelines for AES standard for digital audio engineering.
AES52Describes how to insert unique identifiers into AES3 digital audio content.
AES57Audio object structures for preservation and restoration.
AES60AES standard for Core audio metadata.
AES67Interoperability of Audio over IP networks.
AES70Open Control Architecture.
AES77Loudness of streamed audio. Also described by EBU R128.

AES3 - Serial Transmission For 2-channel Digital Audio

AES3 was originally published in 1985 and has been continually revised. In 2003, amendments to the standard were incorporated into the main body. In 2009, the 2003 edition was divided into four separate parts which are now published separately. The entire set was reaffirmed in 2019 being deemed up to date and not requiring changes at that time.

This standard describes how to transmit two channels of digital audio over a variety of different mediums. The supported audio format is linear Pulse Code Modulation (PCM) which is an uncompressed stream of samples. Sample sizes between 16 and 24 bits are supported. Other formats are possible but not described by AES3. See AES5 for the list of acceptable sample rates.

AES3 is composed of four parts describing Digital input-output interfacing. In particular the serial transmission format for two-channel linearly represented digital audio data.

DocumentDescription
AES3-1Audio content semantics. Describing the sampling frequency based on AES5.
AES3-2Metadata & sub-code data are transmitted with the audio content such as channel-status, user & ancillary data. The use of pre-emphasis to enhance the audio is indicated in the channel status.
AES3-3Unidirectional transport link framing & channel co-ordination. This also embeds a recoverable clock signal.
AES3-4Physical & electrical signal levels & wiring.

The use of abbreviations in audio/visual contexts is sometimes ambiguous and overloaded with hidden meaning. For example, when the interface is described as AES rather than AES/EBU, the means of electrical connection might be different.

This quotation from Ray Arthur Rayburn – a highly respected audio engineer in the AES community – explains why:

“AES3 allows the use of transformer or transformerless interfaces, while the corresponding EBU standard requires the use of transformers. Therefore, it has become a common shorthand to say AES/EBU when the interface is transformer coupled, and AES3 when it is not or if the interface type is unknown.”

AES/EBU is described in the third edition of the EBU Tech 3250 document.

Also see document AES2id which provides guidelines for the use of the AES3 interface. The fourth edition of AES2id was published in 2020. This document provides important insights when applying AES3. Make sure you are using the most recent edition because the 2006 edition refers to the single part 2003 version of AES3. The latest version has been revised to refer to the current four part version. These standards documents are relevant to the use of AES3:

DocumentDescription
AES preprint 3783Twisted-pair cables for AES/EBU Digital Audio Signals. A technical paper presented at the 96th AES Convention in Amsterdam.
AES-2idGuidelines for the use of the AES3 interface.
AES5Professional Digital Audio Applications Employing Pulse Code Modulation-Preferred Sampling Frequencies.
AES10Serial Multichannel Audio Digital Interface (MADI).
AES11Synchronization of digital audio equipment in studio operations.
AES18Format for the user data channel of the AES digital audio interface.
AES26Conservation of the polarity of audio signals.
AES47Digital input-output interfacing - Transmission of digital audio over asynchronous transfer mode (ATM) networks.
AES52Insertion of unique identifiers into the AES3 transport stream.
AES55Carriage of MPEG Surround in an AES3 bitstream.
EBU T3250Specification of the digital audio interface (also known as the AES/ EBU interface).
EBU TR R68Alignment level in digital audio production equipment & in digital audio recorders.
IEC 60169-8RF coaxial connectors diameter of outer conductor 6.5mm with BNC lock.
IEC 60268-12Application of connectors for broadcast & similar use.
IEC 60603-7Detailed specification for 8-way connectors.
IEC 60958-1Serial, uni-directional, self-clocking interface for the interconnection of digital audio equipment.
IEC 60958-3Digital audio interface for consumer applications.
IEC 60958-4Digital audio interface for professional applications.
ISO 646ISO 7-bit coded character set for information interchange.
ISO/IEC 11801Generic cabling for customer premises.
ISO/IEC 23003-1MPEG Surround sound.
ITU-R BS.450-3Transmission standards for FM sound broadcasting at VHF.
ITU-R BS.647A digital audio interface for broadcasting studios.
ITU-T J.17Pre-emphasis used on sound program circuits.
ITU-T V.11Electrical characteristics for balanced double-current interchange circuits operating at data signaling rates up to 10 Mbps.
RFC 4122A Universally Unique IDentifier (UUID) URN Namespace.
RFC 9562Universally Unique IDentifiers (UUIDs).
RP 155SMPTE Recommended practice for the reference level in digital audio systems.
ST 276Transmission of AES/EBU Digital Audio Signals Over Coaxial Cable.
ST 297Serial Digital Fiber Transmission System for ANSI/SMPTE 259M Signals.
ST 337Format for Non-PCM Audio & Data in an AES3 Serial Digital Audio Interface.
ST 338Format for Non-PCM Audio & Data in AES3 - Data Types.
ST 339Format for Non-PCM Audio & Data in AES3 - Generic Data Types.
ST 340Format for Non-PCM Audio & Data in AES3 - ATSC A/52B Digital Audio Compression Standard for AC-3 & Enhanced AC-3 Data Types.

Refer to ST 2110-31 to see how AES3 is applied to a modern IP studio architecture.


AES5 - Preferred Sample Frequencies

This standard describes various sample rates and recommends 48 kHz at the outset because it is numerically easier to convert this to other sample rates. See Section 4.2 of the standard for an explanation. Sample rates at 96 and 44.1 kHz are also described.

There is an interesting paragraph on bandwidth (see Section 4.1) based on the Nyquist-Shannon sampling theorem.

Derived sample rates ranging from half to eight times the basic sample rate are also described. There are tables listing the number of samples per frame of video at different frames per second rates vs. audio sample rates.

This is an important foundational standard referred to by AES3 and AES67 and other related documents.

AES10 - Multichannel Audio Digital Interface (MADI)

The latest edition of AES10 and its supporting AES10id information document were both published in 2020.

Refer to the digital audio engineering guidelines described in the supplementary AES10id document.  This provides additional insights into how to apply the AES10 standard. The transmission data format is explained in more detail which is very helpful when developing MADI compatible interfaces.

If you intend to use this standard to design a product, there may be patent license fees to pay.

These standards documents are relevant to the use of AES10:

DocumentDescription
AES3Serial transmission format for two-channel linearly represented digital audio data.
AES11Synchronization of digital audio equipment in studio operations.
AES47Transmission of digital audio over asynchronous transfer mode (ATM) networks.
EN 50083-9Interfaces for CATV/SMATV head ends and similar professional equipment for DVB/MPEG-2 transport streams.
IEC 60169-8Radio-frequency coaxial connectors with inner diameter of outer conductor 6.5 mm (0.256 in) with BNC lock.
ISO 9314-1Fiber Distributed Data Interface (FDDI) Token Ring Physical Layer Protocol (PHY).
ISO 9314-3Fiber Distributed Data Interface (FDDI) Physical Layer Medium Dependent (PMD).
SMPTE 297Serial digital fiber transmission system for ANSI/SMPTE 259M signals for television.
SMPTE 320Channel Assignments and Levels on Multichannel Audio Media for television.
SMPTE 323Channel Assignments and Levels on Multichannel Audio Media for motion picture film.

What Is 4B5B Encoding?

AES10 describes a bitstream format for use with MADI based on sending 4-bit frames in a 5-bit wrapper. It operates at the physical connection layer in the OSI network taxonomy. This has certain advantages for signal quality and ancillary control messages.

The 4B5B technique is a very neat solution to provide synchronization of the electro-optical signals arriving at network receivers. This operates at the physical layer in the OSI network model. This is the very lowest foundational layer that describes the electrical (or optical) connection between end-points.

This 4B5B encoding technique is used on optical fiber (FDDI), Ethernet and USB interfaces. It converts each frame of 4-bits into a 5-bit frame. There is a small disadvantage that this increases the amount of data being transmitted by 25%. This allows some of the additional values to describe ancillary control signals.

The resulting electrical or optical level is designed to guarantee there is always at least one transition between the 0 and 1 state somewhere within the 5-bit frame. The only exception to this is the loss of signal symbol (00000) and the idle symbol (11111). By careful detection of the 5-bit values being sent, the information transmitted is easily synchronized by re-framing at the receiver. There are also electrical noise immunity benefits from adopting this approach.

Nomenclature

This technique is sometimes described as 4B5B or 5B4B when operating in reverse. Search for both variants when looking for information. The 5B4B terminology describes data conversion from 5-bit frames to 4-bit frames. Occasionally it is used instead of 4B5B but that might be a typographical error.

A similar technique called 8B10B is described in the MADI documentation for 8-bit framing. This is far less efficient than 4B5B.

Converting From 4-bit To 5-bit

Here is the conversion between 4-bit framed data values and 5-bit frames shown as a table. Only 16 of the 32 values possible with 5-bits are needed to carry the 4-bit data:

Hexadecimal 4-bit Value 4B5B Code
0 0000 11110
1 0001 01001
2 0010 10100
3 0011 10101
4 0100 01010
5 0101 01011
6 0110 01110
7 0111 01111
8 1000 10010
9 1001 10011
A 1010 10110
B 1011 10111
C 1100 11010
D 1101 11011
E 1110 11100
F 1111 11101

 

Ancillary Symbols

Some of the other 16 values have special meaning. They are called symbols and use letters above the hexadecimal range. The symbols are somewhat mnemonic to aid in understanding their meaning:

Name Symbol 4B5B Code
Halt H 00100
Idle I 11111
Start #1 J 11000
Start #2 K 10001
Start #3 L 00110
Quiet (loss of signal) Q 00000
Reset R 00111
Set S 11001
End (terminate) t 01101

 

Control Codes

The mnemonic symbols are used in various combinations to send commands ‘down the wire’. Several are used on their own, a few are used in pairs. Groups of four symbols are used on USB interfaces:

Command Symbols 5-bit Frame Sequence
Sync, Start delimiter JK 11000 10001
100BASE-X idle marker I 11111
USB-PD end delimiter T 01101
FDDI end delimiter TT 01101 01101
Not used (terminate - set) TS 01101 11001
SAL (idle - halt) IH 11111 00100
100BASE-X end delimiter TR 01101 00111
Not used (set-reset) SR 11001 00111
Not used (set-set) SS 11001 11001
100BASE-X transmit error H 00100
USB-PD Start Of Packet (SOP) JJJK 11000 11000 11000 10001
USB-PD SOP′ JJLL 11000 11000 00110 00110
USB-PD SOP″ JLJL 11000 00110 11000 00110
USB-PD SOP′_Debug JSSL 11000 11001 11001 00110
USB-PD SOP″_Debug JSLK 11000 11001 00110 10001
USB-PD Hard Reset RRRS 00111 00111 00111 11001
USB-PD Cable Reset RJRL 00111 11000 00111 00110

 

AES10 mandates that a JK Sync, Start delimiter signal should be periodically inserted into the MADI bitstream.

Unused Values

A few of the available 5-bit frame values are unused. If they are observed at the receiving end, it indicates a problem and helps to detect errors in the transmission:

  • 00001
  • 00010
  • 00011
  • 00101
  • 01000
  • 01100
  • 10000

Supporting Documents

Refer to these documents for more information:

DocumentDescription
AES10 - Section 4.3Transmission format.
AES10 - Annex AExample of link encoding.
AES10 - Annex BUse of 4B5B sync symbols for channel-independent data.
AES10id - Section 8MADI Transport stream.
ISO 9314-1Token Ring Physical Layer Protocol (PHY).
Wikipedia 4B5BMostly derived from AES10.

The AES10 standard directs you to the AES web sites for details of the 4B5B line code for transmitting data on fiber optic or Ethernet connections. That document is missing but there is a very good explanation of 4B5B signaling on Wikipedia. Read that in conjunction with Annex B of the AES10 standard.

AES11 - Synchronization Of Digital Audio Equipment

Multiple channels of audio must be carefully synchronized. The sample clocks governing when the source audio is captured must be accurately regulated. Any downstream processing needs to maintain the phase relationships between channels to avoid introducing unwanted audible artefacts. This is a complex topic and there are many solutions.

Equipment using an internal sample clock must be locked to an external source. AES11 describes this as a Digital Audio Reference Signal (DARS) which is delivered separately from the audio content (usually via a separate connection). AES5 describes multiples of up to eight times the basic sample rate. The internal sample clock must be capable of reliably locking to all of these.

Alternative synchronization techniques can be used instead of DARS:

  • Embedded time signatures based on the packet header timestamps. This may drift out of sync with other streams.
  • Video reference syncing to frame-start events.
  • GPS locking. This requires a separate receiver device and locks to real-world time.

AES11 describes the word clock (see Annex B). This synchronizes hardware devices (such as digital tape machines or CD players). The word clock governs the timing of each sample passing through the system and is derived from a centralized reference. This will be familiar to broadcast engineers who ensure that video across an enterprise is frame synchronous by distributing sync pulses from a reliable source.

The word-clock is not the same as timecode. The word-clock is integral to the sampling process and transmission of the digital audio where the timecode is a separate metadata service that describes the media being transmitted. They operate in different time domains.

AES11 refers to AES5 and augments the sample rate descriptions with advice pertaining to video reference timing.

AES11 is relevant when studying PTP and other timing protocols. In particular when designing products that need to work with ST 2110 based IP studios. The latest version of AES11 was published in 2020.

These standards documents are relevant to the use of AES11:

DocumentDescription
AES3Serial transmission format for linearly represented digital audio data.
AES5Professional digital audio applications employing pulse code modulation - Preferred Sampling Frequencies.
AES47Transmission of digital audio over asynchronous transfer mode (ATM) networks.
ST 318Synchronization of 59.94 or 50 Hertz related video and audio systems in analog and digital areas.
RP168Definition of Vertical Interval Switching Point for Synchronous Video Switching.
WHP 074BBC R&D White Paper about the development of ATM network technology for live production infrastructure.

AES18 - Ancillary User Data Channel Format

Ancillary user specified metadata can be embedded within an AES3 audio stream. Messages can be any length. The only limitation is the maximum bitrate which caps the amount of data that can be inserted in addition to the audio payload. A long message could describe the entire asset with an abstract for display in an EPG. Shorter messages provide synchronous data such as:

  • Subtitle text.
  • Script cues.
  • Editing information.
  • Copyright assertions.
  • Performer credits.
  • Downstream switching instructions.

This is managed carefully to avoid delaying the audio content. Messages can be split and portions deferred to accommodate the bitrate capping limit.

Ancillary data adapts the High-level Data Link Control (HDLC) protocol originally defined in ISO 3309 (as defined in AES18). That standard has now been withdrawn and replaced by ISO 132239. HDLC is bi-directional, but in the context of AES3, the messages only travel one way with no handshaking.

Error resilience helps detect data corruption at the receiver. If necessary, important data could be delivered in a carousel-like structure and repeated periodically.

The standard lists many external references in the Annex C Bibliography. These date from the mid 1980’s to the 1990’s and cover radio text services which are now deployed worldwide. 

Because of the vintage, the specified character sets do not yet use Unicode. Text is constrained to 8-bit character codes as defined in ISO 4873. UTF-8 character encoding of Unicode text is briefly mentioned in the AES67 standard.

AES50 - High-resolution Multi-channel Interconnection

AES50 interconnections are sometimes abbreviated to HRMAI. HRMAI is intended to be used in a point-to-point fashion rather than transmitting data over a network. Having just a sender and receiver enables it to provide the following advantages:

  • Supports many commonly-used digital audio coding formats, including “high-resolution” formats such as high sample-rate linear PCM, and one-bit delta-sigma modulated formats.
  • Low and predictable latency less than 100 microseconds.
  • Able to use CAT-5 data cable which is generally cheaper than CAT-6 or CAT-7.
  • Interconnections span distances up to 100 meters.
  • High-quality full-duplex clocks are transmitted in parallel with the audio data.
  • Full-duplex audio interconnection allows traffic to move in both directions at the same time.
  • 5 Mbit/sec full-duplex auxiliary data connection, compatible with Ethernet networks. This is in parallel with the point-to-point interconnect for audio essence.

Project report AES-R6 provides additional guidelines for deploying the AES50 standard for HRMAI connections. Read both AES50 and AES-R6 documents to gain a better understanding of HRMAI. The latest editions of both documents were published in 2020.

These standards documents are relevant to the use of AES50:

DocumentDescription
AES3Serial transmission format for two- channel linearly represented digital audio data, Parts 1 to 4.
ANSI X3.263Fiber Distributed Data Interface (FDDI) - Token Ring Twisted Pair Physical Layer Medium Dependent (TP-PMD).
IEEE 802.1QVirtual Bridged Local Area Networks.
IEEE 802.3Carrier sense multiple access with collision detection (CSMA/CD) access method and physical layer specifications.
ISO 8802-3ISO re-published version of IEEE 802.3.
TIA/EIA-568-B.2Balanced twisted-pair cabling components.

AES52 - Inserting Unique Identifiers Into AES3

Inserting unique identifiers using the 128-bit UUID standard is helpful for identifying the stream content at the receiving end. This also helps to connect the stream to metadata that may be stored in another system. The UUID message is transmitted periodically so that clients joining the stream part way through can still acquire the value.

These standards are also relevant:

StandardVintageDescription
ISO 9834-82014Generation of universally unique identifiers (UUIDs) and their use in object identifiers.
ISO 115781996Remote Procedure Call (RPC).
ITU-T Rec. X.6672012OSI networking and system aspects - Naming, Addressing and Registration.
RFC 41222005A Universally Unique IDentifier (UUID) URN Namespace. Obsoleted by RFC 9562.
RFC 95622024Universally Unique IDentifiers (UUIDs).

The related Unique Material Number described by SMPTE ST 330 is less unique as it has fewer bits but it can use a UUID with some special formatting to implement the UMID.

AES57 - Audio Object Structures

AES57 describes the attributes for Audio Objects. These are carriers for audio essence realized as discrete samples that are packaged as multi-channel frames and presented on a timeline. These attributes are reflected as properties when the objects are instantiated in an Object Oriented programming environment.

Annex A describes an XML schema that can be used to represent audio objects.

Understanding the vocabulary used to describe the audio objects informs the design of your metadata model which facilitates the building of a reliable content management system and workflow process supervisor tool.

AES60 - AES Standard For Core Audio Metadata

This standard is described in the Information document identified as AES60id. The latest edition was published in 2020.

The AESCore metadata schema is consistent with the EBUCore schema published as EBU Tech3293. Both of these are extensions of the original DublinCore metadata schema.

EBUCore is the minimum set of attributes needed to describe video and audio media resources.

XML is used as the support and tools are widely available for creating, editing and harvesting metadata delivered in this format.

AES67 - High-Performance Streaming Audio

The original intent for AES67 was to deliver professional quality audio over a high-performance IP network with less than 10ms latency. Bridging diverse pre-existing audio networking systems to provide interoperability was also a core goal. This is suitable for sound reinforcement at live events.

High performance is feasible on existing local area networks (LAN). If suitable switching hardware is available, it can be supported widely across an enterprise.

These are the main features:

  • Based on existing and well-known IT standards described in IETF RFC documents.
  • Synchronization with boundary clock converters.
  • Streaming transport via RTP.
  • Session description with SDP.
  • Low-latency delivery of uncompressed audio.
  • Ideal for live, studio and broadcast situations.
  • Decentralized configuration and management of devices.
  • Coexists with other IT data traffic on the same network.

Prior to AES67, the available audio networking solutions were incompatible with one another. AES67 is designed to reconcile the needs of architectures designed by different manufacturers and facilitates interoperability between:

  • Dante
  • Ravenna
  • QLAN
  • WheatNet-IP
  • Livewire

These topics are addressed by the standard:

Transport Synchronization - A variety of techniques are discussed in Section 4 of the standard.

Media Profiles - Standard IP networks must adhere to a media profile (see Annex A) to ensure timely delivery of packets.

Boundary Clock Converters - Networks using switching hardware that supports IEE PTP protocols can provide boundary clock conversion and should provide adequate performance for audio delivery.

AVB - Enhanced Ethernet Networks that conform to IEEE 802.1Q are described as Audio Video Bridging (AVB) and provide synchronization based on IEEE PTP. This is covered in Annexes C and D.

Media Clocks - These are described in Section 5 and provide synchronization at the sample level. A media clock advances in sync with the sample rate. The same frequency should be used for the RTP clock.

Payload Encoding - This is described in Section 7, which reiterates the limited range of three preferred sample rates from AES5 with two possible sample sizes. Packet sizes are determined primarily by how long the data in them would play for the given sample rate. AES67 describes these sample rates (derived from AES5):

  • 48 kHz
  • 96 kHz
  • 44.1 kHz

The standardized sample sizes and formats are defined in great detail in these IETF RFC documents:

  • L16 - 16-bit linear format as defined in RFC 3551 clause 4.5.11.
  • L24 - 24-bit linear format as defined in RFC 3190 clause 4.

Channel Count - Up to 120 channels of audio can be carried in a generic AES67 link. ST 2110-30 limits the number of channels depending on the conformance level of the receiving device. This may be as low as four channels at level AX and not more than 64 for level C.

SDP - Session Description Protocol provides discovery and connection management support. This includes keep alive heartbeats to maintain connections. The discovery systems are described in Annex E. These include the AMWA NMOS IS-04 specification used by ST 2110.

IETF RFC References - Because this is a standard describing IP network transmission, there are many RFC documents cited in the normative references in Section 2 of the standard and more references are included in the bibliography in Annex H. Using the IETF specifications ensures compatibility with the rest of the IP network traffic.

Networked audio conforming to AES67 is used in ST 2110 installations and covered by ST 2110-30. Additional supporting documentation is available in these AES project reports:

DocumentDescription
AES-R12AES67 Interoperability PlugFest - Munich 2014.
AES-R15AES67 Interoperability PlugFest - Washington 2015.
AES-R16PTP parameters for AES67 and SMPTE ST 2059-2 interoperability.
AES-R17AES67 Interoperability PlugFest - London 2017.
AES-R19AES67 Protocol Implementation Conformance Statement (PICS) Summary.
AES-R20AES67 beyond the LAN.

If you intend to use this standard to design a product, there may be patent license fees to pay.

These standards documents are relevant to the use of AES67: 

DocumentDescription
AES5Preferred sampling frequencies for applications employing pulse-code modulation.
AES11Synchronization of digital audio equipment in studio operations.
AES67High-performance streaming audio-over-IP interoperability.
AES-R16PTP parameters for AES67 and SMPTE ST 2059-2 interoperability.
EBU Tech 3326Audio contribution over IP - Requirements for Interoperability.
IEEE 1588IEEE Standard for a Precision Clock Synchronization Protocol for Networked Measurement and Control Systems.
IEEE 802.1ASTiming and Synchronization for Time-Sensitive Applications in Bridged Local Area Networks.
IEEE 802.1BAAudio Video Bridging (AVB) Systems.
IEEE 802.1QMedia Access Control (MAC) Bridges and Virtual Bridged Local Area Networks.
IS-04AMWA NMOS Discovery & Registration.
ISPCS paperUsing an IEEE 802.1AS Network as a Distributed IEEE 1588 Boundary, Ordinary, or Transparent Clock. Presented at the IEEE-ISPCS conference 2010.
RFC 768User Datagram Protocol.
RFC 791Internet Protocol.
RFC 792Internet Control Message Protocol.
RFC 894A Standard for the Transmission of IP Datagrams over Ethernet Networks.
RFC 1112Internet Group Management Protocol, Version 2.
RFC 2474Definition of the Differentiated Services Field (DS Field) in the IPv4 and IPv6 Headers.
RFC 2597Assured Forwarding PHB Group.
RFC 2616Hypertext Transfer Protocol - HTTP/1.1 RFC 2974 - Session Announcement Protocol.
RFC 3170IP Multicast Applications Challenges and Solutions.
RFC 3190RTP Payload Format for 12-bit DAT Audio and 20- and 24-bit Linear Sampled Audio.
RFC 3261SIP - Session Initiation Protocol.
RFC 3264An Offer/Answer Model with the Session Description Protocol (SDP).
RFC 3376Internet Group Management Protocol, Version 3.
RFC 3550RTPA Transport Protocol for Real-Time Applications.
RFC 3551RTP Profile for Audio and Video Conferences with Minimal Control.
RFC 4028Session Timers in the Session Initiation Protocol (SIP).
RFC 4566Session Description Protocol.
RFC 5939Session Description Protocol (SDP) Capability Negotiation.
RFC 6762Multicast DNS.
RFC 6763DNS-Based Service Discovery.
RFC 7272Inter-Destination Media Synchronization (IDMS) Using the RTP Control Protocol (RTCP).
RFC 7273RTP Clock Source Signaling.
IETF draftUsing OPTIONS to Query for Operational Status in the Session Initiation Protocol (SIP).
IETF draftSIP URI Service Discovery using DNS-SD.

AES70 - Open Control Architecture

The Open Control Architecture (OCA) was the foundation for AES70. AES70 is now the formal standard for OCA. It describes a scalable control-protocol for managing media devices over an IP network. This is quite separate to managing streaming services although it needs to take account of such traffic.

The emergence of SMPTE ST2138 develops the same concepts.

AES70 is designed around an object oriented approach to coding. It uses the HTTP accessor methods to GET or SET various properties on a target device.  Changes to a property are notified with an event that triggers a handler of some kind. This is described in the AES70 Class Structure.

The goal of AES70 is to provide full-function device control and monitoring for this range of situations:

  • Professional applications.
  • Multi-vendor systems.
  • Mission-critical or noncritical applications.
  • Media networking applications of all sizes from two to 10,000 nodes or more.
  • Secure or insecure implementations.
  • Multiple-controller systems.
  • Peer to peer systems devoid of separate controllers.
  • Audio devices are targeted now.
  • Multiple connection methods are supported.
  • Video devices will be targeted in future.
  • Other related equipment may be scoped into AES70 as a long-term goal.
  • Devices of all sizes - wall panel to mixing desk, possibly with tiny processors.
  • Dynamically-reconfigurable devices.
  • Products with proprietary features.
  • Able to work on low and high bandwidth networks.

AES70 can operate on any IP network and uses these connection methods to reach target devices:

  • WebSockets
  • JSON
  • TCP
  • UDP

Audio devices can be controlled using adapters for these protocols. Some presentations on AES70 describe other alternatives as well:

  • Dante
  • Ravenna
  • Milan
  • AES67

Video devices are expected to be supported via adapters for these protocols. Others will be introduced in due course:

  • SDVoE
  • ST 2110

AES70 specifies several different protocols, not all of which have been publicly released. Currently only OCP.1 has been defined for use on TCP/IP networks. When other protocols are defined, they will all be based on the same core object model.

There are several published parts of the AES70 standard with others nearing publication. The numbering of these parts suggests there are many more to come. The descriptions of those parts is not yet publicly known outside of the AES organization. At the time of writing, this is what we know so far. The information is collated from presentations at conferences and published documents:

Parts 1, 2 and 3 describe Core functionality. Parts 21, 22 and 23 are adapters for various proprietary protocols. These work rather like software drivers. It is helpful to read the first three parts together.

If you intend to use this standard to design a product, the standard warns that there may be patent license fees to pay. However the OCA organization states that the protocols are supposed to be license free. Any patents may only affect some of the management adapters used for proprietary hardware.

AES70 is developed in collaboration with the Alliance for IP Media Solutions (AIMS) and the Open Control Architecture Alliance (OCA). Find out more about the Open Control Architecture Alliance here:

https://ocaalliance.com/

DocumentVintageDescription
AES70-12018OCA - Core framework describing the models and mechanisms.
AES70-22018OCA - Core control class structure describing the functional capabilities.
AES70-32018OCA - Core OCP.1 Binary communications protocol for IP networks.
AES70-4DraftOCA - JSON protocol.
AES70-21DraftAES67 and SMPTE ST 2110 connection Management adapter for controlling streaming connections.
AES70-222024Milan (AVB) media transport connection Management adapter for controlling streaming connections.
AES70-23DraftDante connection management adapter.

These standards documents are relevant to the use of AES70:

Document Description
AES17 Measurement of digital audio equipment.
Avnu-Milan The "Milan Specification", published by the Avnu Pro Audio Technical Workgroup.
IEEE 1722 AVTP - IEEE Standard for Layer 2 Transport Protocol for Time-Sensitive Applications in Bridged Local Area Networks.
IEEE 1722.1 ATDECC - IEEE Standard for Device Discovery, Connection Management, and Control Protocol for Time-Sensitive Networking System.
IEEE 754 IEEE Standard for Binary Floating-Point Arithmetic.
IEEE-1588 Precision Clock Synchronization Protocol (PTP) for Networked Measurement and Control Systems.
IEEE-802.1AS Local and Metropolitan Area Networks - Timing and Synchronization for Time-Sensitive Applications in Bridged Local Area Networks.
IEEE-802.1BA Audio Video Bridging (AVB) Systems.
IEEE-802.1Q Media Access Control (MAC) Bridges and Virtual Bridge Local Area Networks.
IS-12 AMWA NMOS Control Protocol.
ISO 9787 Robots and robotic devices - Coordinate systems and motion nomenclatures.
ISO 10646-1 Universal Multiple-Octet Coded Character Set (UCS) - Part 1 - Architecture and basic multilingual plane.
ISO 19503 XML Metadata Interchange (XMI).
ITU-R BS.2076-1 Audio Definition Model.
ITU-R BS.2076.1 Audio Definition Model - Chapter 8, Coordinate System.
NIMA TR8350.2 US Department of Defense World Geodetic System.
RFC 3927 Dynamic Configuration of IPv4 Link-Local Addresses.
RFC 4279 Pre-Shared Key Cipher-suites for Transport Layer Security (TLS).
RFC 4862 IPv6 Stateless Address Auto-configuration.
RFC 5246 The Transport Layer Security (TLS) Protocol.
RFC 6335 Internet Assigned Numbers Authority (IANA) Procedures for the Management of the Service Name and Transport Protocol Port Number Registry.
RFC 6455 The WebSocket Protocol.
RFC 6762 Multicast DNS.
RFC 6763 DNS-Based Service Discovery.
RFC 7231 Hypertext Transfer Protocol (HTTP/1.1) Semantics and Content.
RFC 7235 Hypertext Transfer Protocol (HTTP/1.1) Authentication.
ST 2059-2 SMPTE Profile for Use of IEEE-1588 Precision Time Protocol in Professional Broadcast Applications.

 


Whilst AES70 is a more mature technology, SMPTE ST2138 provides a wider scope of control and will emerge as a competing solution.


The IABM has published an interesting comparison table that maps various attributes of AES 70 against other control plane architectures (such as ST 2138, Ember+ and NMOS). Search online for this document title to locate a copy for downloading:

“IABM-Control-Plane-Comparison-AES”

AES77 - Loudness Of Streamed Audio

Setting the loudness of streamed audio has been an important topic for some time. Content providers have a duty to protect the consumers and avoid hearing damage. There has been a great deal of work on this topic by various organizations. AES provides a page of collected background information about the Loudness project at their web site:

https://aes2.org/audio-topics/loudness-2/

The references and resources page has links to many other useful downloadable documents from a variety of other organizations:

https://aes2.org/resources/audio-topics/loudness-project/resources-and-references/

This is a very useful collection of technical papers on loudness measurements and calibrations, and has links to a large number of relevant documents from collaborating organizations that have worked on this with the AES.

AES77 is based on the AES technical document TD1008.1.21-9 which is accessible from the resources page and contains the same fundamental knowledge. Other resources such as EBU R 128 are also helpful.

AES77 is currently being worked on by task group SC-02-12-Q. A newer version is likely to be published when they have completed their revisions.

These standards documents are relevant to the use of AES77. These and other useful supporting white papers and conference proceedings are linked from the AES Loudness project resources page:

DocumentDescription
AES71Recommended Practice Loudness Guidelines for Over-the-Top Television and Online Video Distribution. Based on AES TD 1006.
AES TD1005Audio Guidelines for Over the Top Television and Video Streaming.
AES TD1006Loudness Guidelines for OTT and OVD Content.
AES TD1008Recommendations for Loudness of Internet Audio Streaming and On-Demand Distribution.
ANSI/CTA-2075Loudness Standard for Over-the-Top Television and Online Video Distribution for Mobile and Fixed Devices.
ATSC A/85ATSC Recommended Practice Techniques for Establishing and Maintaining Audio Loudness for Digital Television.
CENELEC - EN 50332-3Sound system equipment headphones and earphones associated with personal music players - Maximum sound pressure level measurement methodology - Part 3.
EBU - Tech 3341Loudness Metering. ‘EBU Mode’ Metering to supplement loudness normalization in accordance with EBU R 128.
EBU - Tech 3342Loudness Range - A measure to supplement loudness normalization in accordance with EBU R 128.
EBU - Tech 3343Practical guidelines for Production and Implementation in accordance with EBU R 128.
EBU - Tech 3344Practical guidelines for distribution systems in accordance with EBU R 128.
EBU R 128Loudness Normalization and Permitted Maximum Level of Audio Signals.
EBU R 128 S1Loudness Parameters for Short-Form Content.
EBU R 128 S2Loudness in Streaming.
ITU-R BS.1770-5Algorithms to measure audio program loudness and true-peak audio level.
ITU-R BS.1771-1Guidelines for audio loudness of online video content, prepared by the AES Audio Guidelines for Over the Top Television and Video Streaming (AGOTTVS) technical group.
ITU-T H.870E-health multimedia systems, services and applications - Safe listening.

Older AES Standards

Earlier AES standards are based on Asynchronous Transfer Mode (ATM) networks. An ATM network can carry voice and data simultaneously. Ethernet can only carry data but Voice over IP (VoIP) supports telephony applications as well.


AES47 & AES51 describe how to transmit audio over ATM networks.


AES Member Benefits

Being aware of everything the AES offers will enhance your skill-set when dealing with audio matters.

The AES document collection is a foundational source of reference. The standards are backed up by information documents that help you apply them and other technical documents that describe useful background and supplemental material.

Some of the newer standards such as AES70 will have a profound impact and facilitate the implementation of software defined workflows and studios. AES70 bridges between the software and hardware worlds in a very elegant way.

The benefits of joining the AES as a member far outweigh the cost of subscribing. I cannot recommend this highly enough.

Supported by

You might also like...

Network Traffic Engineering: RIST & SRT - The Success Of ARQ Based Protocols

IP networks are inherently unreliable. We kick off this series on IP Network Traffic Engineering with a look at how RIST and SRT give broadcast engineers user-configurable control over the latency-versus-reliability trade-off for real-time media streaming.

Standards: Video - Standards For Video Coding

From 4K to 32K, the demand for ever-larger video formats is pushing codec technology to its limits. This guide surveys the landscape of video coding standards – from legacy MPEG formats to AI-driven neural network compression – to help navigate the choices sha…

Broadcast Standards 2026 – Video Coding

Video coding was developed to deliver video conferencing services over low-bandwidth modem connections, but modern demands for ever-larger video formats are pushing codec technology to its limits.

Network Traffic Engineering: Part 1

IP networks are inherently unreliable. They always have been – it is literally designed in as a feature.

Standards: An Introduction To Standards

There are many standards relevant to the broadcasting and media industry. In this section we examine the background to standards, who develops them, where to find them and why they are absolutely and totally necessary.