Baking A Cake With The Audio Definition Model

The trouble with Next Generation Audio is its versatility and the wide array of devices which need to deliver the enhanced immersive experience for the consumer. The Audio Definition Model may hold the key.

We are voracious consumers of content, and we’ve never had it so good. We enjoy content in every place and in every way. We get content on traditional terrestrial channels, via subscription services, streamed to our phones on the bus or on our PCs at work. Not only that, but the way we consume content is different too, with the ability to personalize our own viewing experience. Part of that personalization is being able to control which audio objects we hear and adapt our listening experience depending on which device we are using.

Content providers can provide that functionality through a variety of Next Generation Audio (NGA) techniques. NGA is a catch-all term which covers various technologies for providing immersive audio, accessibility features and adaptability, as well as personalization.

NGA can provide control not only over what language the commentary is in but how loud that commentary is. It allows consumers to manage how they listen to content and customize the experience to suit their needs and preferences.

The Forum for Advanced Media in Europe (FAME), a cross-skilled industry body, described the huge appeal of NGA in its “10 Things You Need To Know About Next Generation Audio” booklet.

It states: “Unlike the developments in picture quality, NGA isn't primarily about “more” or “better”, but instead offers different workflow and distribution options as well as enabling new, more flexible personalized user experiences.”

FAME goes on to suggest that NGA is the most important development in audio technology for decades. They are not wrong; audio broadcasting has been done the same way since the first audio transmission by Guglielmo Marconi in 1895 in that it is a fixed, pre-mixed production delivered to the consumer.

NGA changes everything; it democratizes content by putting elements of the audio experience into the hands of the consumer, and in doing so it makes the same content much more accessible to a wider range of people.

The Big Issue

To deliver this, broadcasters are turning to object-based audio (OBA) workflows. OBA is an audio technique which encodes audio objects with accompanying metadata describing exactly what it is and how it contributes to a mix. An audio object might be a kick drum, or a guitar, or a saxophone, but it might also be a mix of the entire drum kit or a crowd.

The receiver decodes this data to ensure objects are rendered as intended in the final output. If enabled in the metadata, some consumer equipment can also allow the contribution of those objects to be modified by the viewer.

The big issue is that receivers from different manufacturers require different systems to encode that data. Companies such as Dolby, Fraunhofer and Xperi all promote different systems to receive, decode and deliver NGA content using their AC-4, MPEG-H and DTS:X codecs.

These are already in use across a range of devices, from televisions to mobile phones, and in multiple countries. For example, MPEG-H has been adopted by a number of television manufacturers across Asia, while AC-4 is installed in televisions across most of Europe. And because of these different emission codecs, content providers don’t want to have to recreate content to meet the requirements of every potential delivery format.

The Audio Definition Model

The Audio Definition Model (ADM) aims to solve that by promoting an open production model to describe the metadata in NGA content, which is independent of any delivery platform and can be interpreted by a wide range of encoders such as for AC-4, MPEG-H and DTS:X.

The ADM is standardized by the International Telecommunication Union (the ITU) and has buy-in from broadcasters all over the world. Not only does it work with all current formats, but it has been specified to provide a basis to build on so that evolving encoders will accept it.

Crucially, by establishing itself as the open production standard for NGA metadata, it enables producers to maintain their creative intent.

How Does ADM Work And What’s It Got To Do With Cakes?

ADM has actually been around for a while. Its specification was originally published in 2014 by the EBU as tech.3364 and it became an ITU Recommendation (ITU-R BS.2076) in 2017.

Officially it describes the structure of an XML metadata model to define the content and format of tracks in an audio file used to deliver NGA experiences, but the ITU also described it as like baking a cake.

A cake recipe consists of three things:

  1. A list of ingredients.
  2. A set of instructions on how to combine those ingredients.
  3. Instructions on how to bake the cake; oven temperatures, timings etc.

This is exactly how ADM works. 

In a cake, the ingredients might be eggs, flour, butter, sugar, maybe some chocolate sprinkles. It’s everything that is combined to create the cake. The ingredients of an audio broadcast are the same –independent audio objects which combine to create the broadcast; native commentary, international commentaries, referee mics, sound beds etc. That covers step #1; the ingredients.

The instructions on how to combine these objects are what makes step #2. In baking, the ingredients need to be mixed in a certain way to guarantee the most delicious cake. In the Audio Definition Model, these instructions are contained in the metadata associated with each object – the metadata refers to how to combine sources together to ensure the best sounding mix; so the commentary objects are in the centre, the crowd atmosphere is spread across the rear and height channels, and so on.

Step #3 is how to bake the cake, and this done by rendering. The ITU define an ADM reference renderer in ITU Recommendation BS.2127. For broadcast, the actual baking is done by the renderer on the consumer device (such as AC-4, MPEG-H or DTS:X). The consumer can either accept the object mix as intended, or modify to suit their preferences according to interactivity options defined in the metadata.

Take Up

“The future of NGA production definitely lies in the Audio Definition Model for managing the metadata necessary for authoring NGA content in a reusable way,” says Matt Firth, a Project Engineer in the Audio Team at BBC R&D who leads a workstream dedicated to production of audio experiences.

Having studied audio technology at university, Firth has spent the last seven years focusing on broadcast audio production at BBC R&D, involving many aspects of NGA research, such as OBA, spatial audio, binaural production, and other techniques which enhance the user experience.

Matt Firth, Project Engineer in the Audio Team at BBC R&D.

Matt Firth, Project Engineer in the Audio Team at BBC R&D.

“ADM isn’t concerned with manipulating, mixing or encoding the audio itself, but provides the metadata required to author and to render NGA content from individual audio assets. It is an open specification, codec-agnostic and does not define the delivery process. Therefore, it doesn’t tie producers into any codec or a specific ecosystem during production. The ADM is very flexible and verbose in order to retain producer intent, and supports object-based, channel-based and HOA (otherwise known as scene-based) assets.

“ADM can be used to drive NGA encoders at the final stages of a media pipeline. In this way, it enables delivery in various NGA formats such as Dolby AC-4, MPEG-H and other future codecs from a single ADM production. To support existing channel-based delivery routes, ADM content can simply be pre-rendered at the distribution stage, such as through the ITU ADM Renderer. The idea is that as well as supporting NGA codecs, producers no longer need to create separate mixes for each of the delivery routes or output formats they need to support.”

Future Development

Tools are being developed to support the creation of mixes using the ADM, such as the EAR Production Suite, a project which began as a collaboration between BBC R&D and the since-closed research institute IRT under the EBU, and which Firth has been involved in since its inception. The EAR Production Suite is open-source and provides support for ADM production through a collection of audio plug-ins for use with DAWs for mixing in post.

These tools allow producers to author ADM content from different asset types.

While software like the EAR Production Suite already provides a comprehensive tool set for post-production, live production is also overcoming barriers to make progress with the adoption of ADM.

As is often the case, live sports are a driver for new technologies, but ADM in its usual form for file-based content is not suitable for live production or live streaming audio. Live production audio requires frames to be delivered in order and in real-time.

Historically, live broadcasting has used trusted delivery mechanisms like MADI, SDI and IP packetization to transport audio, and the ADM is required to support real-time transport too. For live broadcast, a serial format must be adopted to allow the audio to be packaged with its associated metadata.

The ITU Recommendation for Serial ADM (S-ADM) was published in 2019 (ITU-R BS.2125). It describes how the ADM can be represented in a serial metadata format for use in live production and live streaming and describes how the XML document is placed into a serial frame-based structure.

It’s early days for S-ADM, but broadcasters have already been experimenting with the format, such as France TV’s live broadcast of the French Open tennis tournament in 2020.

While industry-led initiatives like S-ADM and the EAR Production Suite, bode well for the future of ADM, ADM’s object-oriented approach and open standard provide interoperability for both the content provider and the distribution codec.

With a growing consumer appetite for cost-effective technologies like 3D soundbars, demand for NGA content will increase, and as we see more AC-4, MPEG-H and DTS:X implementations in consumer devices, delivery route issues will cease to be a barrier. 

You might also like...

NDI For Broadcast: Part 1 – What Is NDI?

This is the first of a series of three articles which examine and discuss NDI and its place in broadcast infrastructure.

Brazil Adopts ATSC 3.0 For NextGen TV Physical Layer

The decision by Brazil’s SBTVD Forum to recommend ATSC 3.0 as the physical layer of its TV 3.0 standard after field testing is a particular blow to Japan’s ISDB-T, because that was the incumbent digital terrestrial platform in the country. C…

Designing IP Broadcast Systems: System Monitoring

Monitoring is at the core of any broadcast facility, but as IP continues to play a more important role, the need to progress beyond video and audio signal monitoring is becoming increasingly important.

Broadcasting Innovations At Paris 2024 Olympic Games

France Télévisions was the standout video service performer at the 2024 Paris Summer Olympics, with a collection of technical deployments that secured the EBU’s Excellence in Media Award for innovations enabled by application of cloud-based IP production.

HDR & WCG For Broadcast - Expanding Acquisition Capabilities With HDR & WCG

HDR & WCG do present new requirements for vision engineers, but the fundamental principles described here remain familiar and easily manageable.