Immersive Audio 2025: Spatial Audio Production & Delivery Formats

The proliferation of consumer devices and the different formats they adopt, is making the broadcast production and delivery chain more complex… time to explore the Audio Definition Model and new emerging standards.

Everyone’s onto a winner with spatial audio.

Not only do consumers get a better product, content providers have a USP to sell and can aim for a bigger share of the audience. Everyone is happy.

But like the proverbial laidback duck that is paddling like mad under the water, there’s a lot going on behind the scenes to create this content. It asks a lot of the entire infrastructure, all the way from acquisition and transport, to how it is interpreted and consumed by the viewer.

We’ve already looked at spatial audio capture, but from a delivery perspective we’ve had to go through some pretty substantial growing pains. Traditional broadcast models have always been built around channel-based audio (CBA) where audio channels are mixed at source and sent directly to a loudspeaker at a specific location. It means each audio signal ends up at their intended loudspeaker without any modification and it has always proved to be a solid system.

But technology and consumer expectations have both surpassed this format, and with a vast range of receivers that can handle stereo, 5.1, 7.1.4 and other formats at home, the configuration at the consumer end can no longer be guaranteed, and consumers might end up listening to content in a way that was not the mixer’s intention.

Based On… What?

Enter scene-based audio (SBA) and object-based audio (OBA), which both have the provision to adapt the audio format to any listening environment.  SBA includes the Ambisonics and higher order Ambisonics that were developed by Michael Gerzon in the 1970s which we spoke about in part one, creating a framework for 3D sound and totally independent of speaker placement.

While SBA formats provide more separation, most modern spatial audio content is derived from OBA. An audio object can be anything from a single microphone to a drum mix to a roaring crowd of 50,000 people, and an object is treated independently of everything else. It is positioned in the soundfield with a chunk of metadata that describes not only what it is, but its relative levels and its position in space.

It means that an immersive or spatially structured scene can be created from a variety of independent audio objects, and the associated metadata means that it will adapt to any listening format when it is reproduced on a consumer’s receiver.

Spatial Formats

Although both object- and scene-based audio have some provision to adapt the audio format to the listening environment, it is OBA that has been adopted by most spatial formats, including Dolby Atmos, Apple Spatial, THX Spatial Audio, DTS:X, MPEG-H and Sony’s 360 Reality Audio. These formats are another reason why consumers are so familiar with the concept of spatial audio, as the big tech companies – and countless headphone manufacturers – have been pushing them for years.

Apple Spatial Audio was developed in collaboration with Dolby Atmos and is used for a substantial amount of content on the Apple Music service. Like binaural audio, Apple Spatial Audio takes surround and Dolby Atmos signals and applies filters to virtualize the output into a three-dimensional space in a stereo format. Unlike binaural audio, because Apple Spatial Audio is based on OBA it is an encoded format which means that its metadata describes which sounds are replicated in which speakers at the consumer end. Other popular encoded formats include Sony 360 Reality Audio and Dolby Atmos.

Apple Spatial also adds head tracking, which enables the soundscape to track the user’s head movements using accelerometers and gyroscopes that are built into compatible devices like Apple’s AirPods and Beats headphones. These enable the listener to listen to the audio relative to where the screen is, so if they turn their head, the audio remains anchored to the same content.

The other big player is Sony, whose aforementioned Sony 360 Reality Audio format delivers 3D audio on Sony PlayStation games consoles. The company has also been building a hardware ecosystem with audio specialists like Denon, Marantz and Sennheiser, and it is available on major music streaming services like Amazon Unlimited and Nugs.

While Sony has the PlayStation ecosystem all tied up for gaming, Dolby Atmos is supported across Microsoft’s Xbox consoles, while THX Spatial Audio – another object based spatial renderer – is popular across a raft of online games, including Riot Games’ Valorant which we mentioned in part two.

Delivery

Meanwhile, work has been going on behind the scenes to enable delivery of all these objects. As far as broadcasters are concerned, the Audio Definition Model (ADM) which is primarily designed for file-based production workflows, and the Serial Audio Definition Model (S-ADM), which is designed for transmitting audio metadata alongside audio content in live and streaming applications, are both critical developments.

First published in 2014, ADM was a European Broadcast Union (EBU) development and is agnostic so it doesn’t tie content producers into any codec-specific ecosystems. In addition to SBA and OBA, it also supports existing CBA delivery, which means that producers don’t need to create separate mixes for every output format they need to support. It can also be interpreted by a wide range of encoders such as Dolby AC-4 and MPEG-H (which we will look at in a future article).

S-ADM, as defined by ITU-R Recommendation ITU-R BS.2125, is a serial representation of ADM that defines a frame-based XML format for serializing ADM metadata. This is what makes it suitable for linear workflows, and S-ADM has been used in Europe as far back as 2020 when France TV used it for coverage of the French Open. The number of S-ADM proofs of concept are still building with the approach being applied at scale once again by France TV at the Paris Olympics, and by Globo in Brazil.

Shoring Up Standards

Other standards are coming on line too, helping to expediate pick up, and they all have similarly impenetrable names.

The spatially oriented format for acoustics (SOFA) is a container file format for spatial acoustic data and was standardized by the AES as AES69-2022 in 2015. Revised in 2022, it brought together a number of databases which all used independent file structures for different types of acoustical data such as HRTF and binaural. As part of the 2022 revision, the AES stated “The use of convolution-based reverberation processors in 3D virtual audio environments has grown with the increase in available computing power. Convolution-based reverberators guarantee an authentic and natural listening experience but also depend on the acoustic quality of the applied spatial room impulse response (SRIR). With a standardized file format for HRTF and SRIR data, each company can contribute its best algorithms, providing good personalized capture and/or rendering, allowing the consumer to choose a combination of technologies for the best quality of experience.”

On the delivery side of things, a consortium of 13 companies including Nokia, Fraunhofer IIS, and Dolby collaborated on the IVAS (Immersive Voice and Audio Services) codec. Developed for the transmission of spatial audio over mobile networks it was launched by global telecommunications standards organization 3GPP in June 2024 and is the standard for immersive audio in 5G mobile networks. Based on the Enhanced Voice Services (EVS) codec, which is already standard in most global mobile networks around the world, it allows consumers to hear 3D spatial sound in real-time.

Tools & Controls

When it comes to rendering a mix and generating the necessary metadata, renderers share a similar attitude to standardizing output, and there are a number of renderers to choose from. In addition to creating immersive mixes, the Dolby Atmos Production Suite integrates with a variety of DAWs and renders down to standard channel-based formats like stereo and 5.1 surround. But not all renderers are created equal.

Unsurprisingly, given its close working relationship with Atmos, the Apple Logic Pro renderer incorporated the Dolby Atmos plugin in 2021. Despite fundamental differences – or perhaps because of them – Apple’s renderer also has the ability to preview Atmos tracks as they would sound on Apple Music as the service uses a different codec to encode the ADM BWF (Audio Definition Model Broadcast Wave Format) file.

BWF is another EBU-specified format which dates back to 1997 and extended Microsoft’s .wav audio format for interoperable use in broadcast by adding the ability to carry metadata. In fact, it was one of ADM’s first applications and ADM BWF is used in Dolby Atmos workflows to add spatial information that tells the renderer how to recreate the sound experience. But while the ADM BWF codec is standardized, Dolby Atmos uses the AC4-IMS codec for binaural headphone playback while Apple uses its own renderer to interpret a Dolby Atmos mix. This means an Apple Music spatial mix will always sound different to one rendered by Dolby on another streaming service.

The MPEG-H Authoring Tool is another popular tool that enables users to add MPEG-H Audio metadata to existing mixes, define parameters and export authored mixes, and it has an authoring plug-in for integration into popular DAWs such as Nuendo and Pro Tools.

Getting On Board

Spatial audio is gathering momentum, and both developers and standards agencies look to be working together to develop interoperable and cohesive technologies that help both producers and consumers alike.

But there is more work to be done, and it doesn’t add value to everything. In the next part we’ll look at where immersive audio is going and how OBA can add even more value to broadcast output, as well as the impact on mixers and how manufacturers are helping to ease the transition.

Supported by

You might also like...

Standards: Audio - Standards For Audio Coding

Audio coding demands very different tools and workflows to video, but the same fundamental principles around quality apply to both. This guide surveys the standards, codecs and container formats you need to navigate modern audio workflows.

Broadcast Standards – The Science Of AI

Artificial Intelligence is already an integral part of our everyday lives and it is already making our lives more productive. But it is far from risk-free.

Virtual Production For Broadcast: Camera Setup, Tracking & Lens Data

We discuss the changes that need to happen around the camera, what information we generate, and how that informs the pictures rendered on the screen.

Broadcast Standards 2026 – Audio Coding

Audio is central to the whole broadcast experience. While video can show us what’s going on, it is audio that tells us how to feel about it. If only it wasn’t all so complicated.

Production–Delivery Convergence: Part 7 - The Economics Of Ambition

Streaming has introduced multiple viewer innovations and benefits, but there is always a hidden cost. Content providers must find a way to innovate within a financial model that can sustain their creative ambitions.