Immersive Audio 2025: Why Consumers Can’t Get Enough Of Spatial Audio

Spatial audio enhances entertainment experiences by placing the listener at the heart of the action. It is used to great effect in gaming and music and in the competitive media landscape, that means broadcasters need to keep up.

In real life we hear everything. We can’t help ourselves.

Whether it’s traffic noise over our left shoulder or the faint bleating of a newborn lamb two fields away, we unconsciously position ourselves in the wider soundscape. It’s instinctive and we don’t even have to think about it.

Although we’re born with an innate ability for spatial awareness, for most of us, our relationship with recorded sound has largely been two dimensional. For decades we grew up listening to the TV and radio in mono or stereo, which does a fantastic job of giving us the context of what’s going on, but not always the best job of giving us the buy-in.

This may be because stereo simply does not exist in the natural world, and it’s the wider environment which gives us perspective. It’s why spatial audio works so well; it is the most human-centric approach because it is the most natural.

Why Now?

Luckily, the way we consume audio is changing and consumers can’t get enough of spatial audio. Conditions are conspiring on both sides of the production chain to make spatial audio more appealing, easier to capture and easier to access; technology has enabled content creators to keep pace with our natural inclinations and enabled consumers to access the results in a variety of ways.

When 5.1 surround sound started gaining traction in the 2000’s, it introduced the concept of spatial audio in the home, but the practicalities were clunky. The move from stereo to surround meant that consumers not only had to invest in a bunch of additional speakers but also had to ensure that their viewing spaces could accommodate them. Adding left, right, center, rear left, and rear right speakers, not to mention a sub, were concessions that were not on most household’s can-do list, and the development of 5.1 surround was severely limited.

But it did sow the seeds of possibility, and it promoted a much more connected viewing experience that was very different to the passive experience that viewers were used to. Content providers began to experiment with more immersive coverage that delivered more value. Content makers like Sky, who were already blazing a trail in live sports with regular 5.1 surround coverage of Premiership soccer, began to inspire broadcasters all over the world, and soon others started to look at increasing their own value propositions with more immersive presentations.

Today it is a golden age for immersive audio, with many big-ticket sports events all broadcasting in Dolby Atmos formats as standard and placing the consumer right in the crowd at the stadium.

For many content providers, it’s no longer a choice.

Spoiled For Choice

What it means to be a broadcaster has changed; we’re all broadcasters now and consumers hold all the cards. 

Whether it is free over the air channels, premium terrestrial and satellite services, subscription streaming services, connected gaming platforms, YouTube, TikTok, or countless other social media channels, there is huge competition for our eyes and ears and the impact all this choice has on potential advertising revenue is not insignificant. The consumption of content is no longer linear and more than any other time in history, content is king, and every single content provider is looking for ways to add value.

Immersive audio adds enormous value. Unlike the collective and dispiriting experience many consumers had when trying to install 5.1 surround into their houses, advances in consumer technology have made it far easier to access. As these technologies have developed, the entire production chain has adapted to a new range of immersive options, from acquisition to delivery, transmission to consumption.

At the consumer end in the home, this is in part down to user-friendly, compact, plug-and-play devices that use beam-steering principles; in immersive smart speakers which project a more immersive representation of stereo from a single speaker, and soundbars to virtualize rear and height channels. While the results might not be perfect, if we’re talking about added value, we are much further down the right path.

But the biggest bridge to immersive audio and what has proved absolutely key to its growth is the public’s widespread adoption of headphones and wireless earbuds. Spatial audio can easily be virtualized for headphones, and these days we are all plugged into the system, wherever we are.

Shoot ‘em Up

By the end of 2021, Riot Games’ first-player shooter Valorant had grown to around 19 million monthly active users. While its rapid growth was partly due to pandemic lockdowns, 2021 was notable for another reason. In March of that year, Riot Games released a software update that introduced spatial awareness through Head Related Transfer Function (HRTF). Technically, these capabilities are created by THX Spatial Audio, an object-based renderer which provides HTRF functionality, but practically it means that players can hear where their opponents are through their headphones, and that gives them the edge.

It was a big deal and gamers understood the value of it right away. HTRF is one of three sound localization cues which we adopt to localize sound in a 3D space. One of these cues is ‘interaural level difference’, which is simply how loud a sound is; if it is louder in one ear, the brain perceives the sound source as being closer to the side nearest that ear. Secondly, the ‘interaural timing difference’ is any time difference between when a sound source reaches each ear. This gives us an indication of where a sound source is. Both these cues can be easily replicated in post-production using delay and panning techniques.

HRTF is a third localization cue and provides additional information for sound source localization based on a person’s physicality. Our own physical attributes play a big part in how we localize sounds when they are attenuated and absorbed by our ears, and they provide height information as well as position and distance. And height information is crucial for full spatial audio.

Finding The Space In Two Channels

Two-channel presentations make a lot of sense; as we know, humans have only ever needed two ears to assess our surroundings in 360°, so it’s no wonder that two-channel virtualized spatial audio is nothing new. Known as binaural audio, it uses all three of these localization cues to create a spatialized mix over two channels.

Source material can be captured using microphones positioned in the ear canal of a real or artificial human head and aims to recreate the same localization, timing and HTRF that our actual heads naturally generate. Capturing binaural recording at source has several advantages. Content can be recorded quickly and cost effectively and does not require any additional processing. Meanwhile, a sound recordist can use a range of microphones to achieve it, such as the iconic Neumann KU 100 binaural head microphone or simple in-ear headset microphones.

Another approach is to binauralize sources in post-production with multi-channel binaural renderers, which are audio processors that recreate the differences in timing and loudness between what each ear hears. Alternatively, binauralized audio can also be recreated – along with many other audio formats – directly from an Ambisonic recording.

Binaural audio is utilized for gaming, radio drama, and increasingly for musical presentations which adopt it to create more space in the music. It is the reason why some music streaming services can charge premiums for access to spatial recordings. 

Far From Average

But binaural’s biggest strength also has its limitations.

Whether binaural audio is captured at source or created in post-production with binaural rendering plugins, they are always a compromise. This is because everyone’s head and everyone’s ears are unique, which means everyone filters sound differently. Binaural microphones like the Neumann KU 100 are based on an average head size and shape, while binaural renderers apply generalized filter sets based on the average person, so the final effect can never be deterministic. Nor can they make the imperceptible adjustments we instinctively make to assess which direction sound is coming from.

For gaming, when there are plenty of other things going on, it is often good enough. But for nuanced, long-format radio and TV drama, if you have anything but an average head, the presentation may not be quite as convincing.

It’s All About The Value

But again, we’re talking about value, and in a world where we have more options than ever on where to spend our money, audio is adding considerable value to these experiences and so it’s no surprise that broadcasters are falling over themselves to create more and more immersive audio content.

And it’s not limited to broadcast and OTT content either. People expect more from their audio presentations and their relationship with audio is not just limited to broadcast. They are listening to more immersive content on headphones; classic recordings are being remixed in Dolby Atmos; art installations are using beamforming technologies to create immersive environments and draw people deeper into the cultural space; and live events are enriching performances by encompassing audio objects and wrapping the audience up in immersive performances.

Consumers all understand the value of immersive audio and they can experience it without having to remodel their front room. We’re not going back to stereo anytime soon, and in part three we will look at how competing formats like Apple Spatial Audio, Sony 360 Reality Audio, and Dolby Atmos are jockeying for position, as well as the role of production formats, and how content producers are using them all to create more immersive content in the battle to secure more consumers.

Supported by

You might also like...

Production Delivery Specifications - The Broadcast Standards Essential Guide

This Essential Guide provides a unique reference resource for production companies or teams preparing to package and deliver assets to broadcasters & streamers. It gathers the published content delivery specifications from the DPP, Netflix, Apple TV+, NABA, The BBC and…

Monitoring & Compliance In Broadcast: Monitoring Compute Systems

With the ongoing evolution from dedicated hardware towards software running on COTS and cloud-compute infrastructure, monitoring compute resource is vital.

IP Monitoring & Diagnostics With Command Line Tools: Part 9 - Continuous Monitoring

Scheduling a continuous monitoring process will detect problems at the earliest opportunity. If the diagnostic tools run often enough, they can forecast a server outage before a mission critical failure happens. Pre-emptive diagnosis and automatic corrections are a very good…

Navigating Streaming Networks For Live Sports: Broadcaster OTT & Streaming Delivery Networks

With the ongoing growth of OTT content consumption, and the drive from live sports broadcasters to provide high-scale and high-quality Direct to Consumer OTT services, Streamers and their customers now demand streaming services that operate at the scale and quality…

Live Sports Production: Camera To Truck

Much of the OB production infrastructure has moved to IP, but has the connectivity between the cameras and the OB or backhaul also migrated to IP?