Immersive Audio 2025: The Rise Of Next Generation Audio

Immersive audio has gone spatial with the addition of height control and NGA formats to support it, and consumer demand is booming… but it is all still about the experience.

This article is part of ‘Immersive Audio 2025: Part 1 - Consumer Demand & Format Proliferation’

Don’t be fooled into thinking immersive audio is all about 3D audio formats. It’s not.

Immersive audio is more about creating a sonic reality that delivers a better user experience. Whether that experience is realistic is not really the point; in fact, using audio to hold a mirror up to the real world kind of misses the point entirely.

Immersion is all about transporting the listener into an entirely different environment, whether that is a live sporting event, a musical composition or a horde of flesh-eating zombies in a dystopian future. It is more about the journey than the destination. And it’s not new either – sound designers have been using audio to transport people to different places for over a hundred years.

The reason why it is such a big deal in modern broadcasting is that both broadcasters and consumers have easy access to technologies that enable viewers to experience more nuanced immersive audio. And while spatial audio does not define immersive audio, it does deliver more opportunities to deliver greater levels of immersion. Moreover, spatial audio is no longer the preserve of consumers who have the space and the desire to install immersive cinema rooms; it can be experienced in a standard lounge with a beamforming soundbar or on a pair of wireless headphones in a park.

Getting More Out Of It

This series aims to understand how immersive audio has developed and why it’s so popular. It will examine how it is captured, manipulated and experienced; how production and transmission formats have developed and how they deliver better access to spatial audio for a range of consumers. It will talk about how production applications differ and what the benefits of each are. It will question why monitoring is so important and examine how manufacturers are creating new ways to provide personalized environments for audio mixers.

From a pure broadcasting perspective, it will also examine how broadcast workflows are adapting to create more opportunities to be more creative, and how the shift to working differently can deliver more accessible content for everyone.

Above all, it aims to understand how immersive audio adds value and how it will continue to influence how we experience the world around us.

Spatial Audio

Spatial audio is part of a group of enhanced audio experiences collectively known as Next Generation Audio (NGA). According to its “Object-based media report,” UK communications regulator Ofcom describes NGA as something “which allows the combination of Objects, Channels, and other audio formats such as ambisonics and binaural audio to be carried. It is now present in at least four out of five new UHD TV sets in Europe and the UK, and most new mobile phone models. It is also included in the national and international TV specifications used in many European countries such as the Italian UHD book, Nordig 3.1, FAVN French specification, Polish TV specification, as well as worldwide in South Korea, USA, and new systems considered in Japan, Brazil and South America.”

The EAR Production Suite is a collection of free VST Plug-ins released as the result of a joint venture between the EBU and BBC R&D - it adds monitoring, panning and rendering to compatible host applications.

It’s a big deal everywhere and it’s the reason why there are multiple cross-industry organizations developing products to generate appropriate content, such as the EBU’s EAR Production Suite. And it is spatial audio in particular that is driving NGA as an important part of UHD standardization.

Unlike 5.1 Surround – another immersive experience that full-fat immersive formats like Dolby Atmos have superseded – 3D spatial audio includes audio objects on the vertical plane as well as the horizontal plane.

Ambisonics

Ambisonics is a concept developed in the 1970’s by Michael Gerzon. It encodes a 360° sound field across a number of channels and the key to its appeal is that it is speaker agnostic; a trait that was ahead of its time in the 1970’s but is now a core part of the way immersive audio is delivered in almost every standard and format.

Ambisonic capture was productized with the Soundfield microphone in 1978 which used four microphone capsules in a single tetrahedral arrangement to capture, encode and store audio in such a way that it could be adapted for any number of speakers, in any format, and for any system.

Gerzon’s concept is known as first order ambisonics and is made up of four component channels: an omni-directional signal (W) plus components representing the X, Y and Z dimensions of the sound. The raw audio captures from each capsule are referred to as A-Format; once they are processed (or encoded) to get directional information it is known as a B-Format signal. Higher Order Ambisonics (HOA) adds additional components on top of the first order ambisonics; second order ambisonics uses nine component channels, while third order ambisonics adds another seven to make 16. The higher the order, the greater the spatial resolution due to the increased number of channels. Specialist ambisonic mics are available from several vendors.

Non-Ambisonic Spatial Audio Capture

There are countless other ways to record 3D immersive audio in the field. Binaural microphones, which we will look at in more detail in part two, can virtualize a 3D space in just two channels by approximating the localization cues that human beings employ, and do a reasonably quick and efficient job of it.

Meanwhile, physically arranging mics in different arrays can create a massive range of immersive results depending on spacings, brands, microphone types and polar patterns. Several leading mic vendors have their own variation on the theme of placing an array of microphones or capsules attached to a central body, and using these fixed position arrays does tend to keep things more predictable.

The advantage of both these approaches is that there is no formatting required in the way there is with an ambisonics mic. Each microphone output has an independent XLR output and is routed into a mixing console for manipulation like any other input.

Consumer Formats

At the other end of the chain, consumers are spoiled for choice. With full spatial audio formats including height, both 5.1.4 and 7.1.4 are extensions of 5.1 and 7.1 with speakers in the height channels. In a 7.1.4 configuration, the “7” refers to the number of traditional surround speakers (front, center, rears etc.); the “1” is the number of subwoofers; and the “4” is how many height speakers can be utilized in the setup.

Spatial audio is received by the consumer in one of two ways; encoded or decoded. Decoded audio arrives in a binaural format, irrespective of whether it has been recorded as a stereo feed using binaural techniques or whether it has been created in post-production using renderers. YouTube is a good example of this.

Encoded audio formats are delivered to the consumer with embedded metadata describing which sounds are replicated in which speakers. Dolby Atmos is a good example of this, as is Sony 360 Reality Audio. Both are object-based, and their metadata decodes the signals to place each object into a specific position in space; again, we’ll dig into this in more detail later on.

The Reality Of Reality

But we’re getting ahead of ourselves.

In practice, if broadcasters are looking to create better user experiences they are likely to use a combination of approaches, and in live production there is often a clear distinction between what is possible and what is desirable. For the audio mixer responsible for the program output, the application of spatial elements has to be easy to manage; they don’t have the time to be panning FX in a live environment, and if it detracts from the viewer’s immersion what’s the point anyway? It has to be considered and it has to be appropriate to the presentation.

In fast-paced live and breaking news, a production is unlikely to use any spatial elements. But live sports often adopt immersive 3D audio for the crowd atmos to put the viewer in the midst of the action; in fact, for many big ticket sports like the World Cup, it’s done this way as standard. But commentary still sits on the center channel and the game FX is still broadcast in stereo. Here the adoption of an ambisonics microphone is desirable, while highly focused supercardioid mics might be utilized to pick up the finer points of the game such as the kick of a ball or the thwack of a bat. These effects might not be realistic, but they all add to the immersion.

And that is precisely the point.

Supported by

You might also like...

Live Sports Production: Exploring The Evolving OB

The first of our three articles is focused on comparing what technology is required in OBs and other venue systems to support the various approaches to live sports production.

Cloud Compute Infrastructure At IBC 2025

In celebration of the 2025 IBC Show, this article focuses on the key theme of cloud compute infrastructure and what exhibitors at the show are doing in this key area of technological enablement.

Monitoring & Compliance In Broadcast: Real-time Local Network Monitoring

With many production systems now a hybrid of SDI & IP networking, monitoring becomes a blend of the old and the new within a software controlled environment.

Cameras & Image Capture Technologies At IBC 2025

In celebration of the 2025 IBC Show, this article gathers all the news about cameras and the array of image capture technology vendors will be highlighting on the show floor.

Broadcast Audio Technology At IBC 2025

In celebration of the 2025 IBC Show, this article gathers the news about what the vendors exhibiting on the show floor for the acquisition, production and delivery of pristine, immersive audio.