Immersive Audio 2025: Object Based Audio - A New Era Of Personalization
Next Generation Audio allows us to treat audio as objects which can be manipulated at the point of consumption, and it is transforming the potential for broadcast personalization.
All 6 articles in this series are now available in our free eBook ‘Immersive Audio 2026 – The Book’ – download it HERE.
All articles are also available individually:
Immersive Is More Than Just Spatial; It’s Personal
Immersive audio has moved on. It’s not just about surround sound anymore; it’s about inclusivity. From cinema and streaming to XR and broadcast, object-based audio is redefining how we interact with our content. With personalization and accessibility increasingly at its core, immersive audio is less about where things are and more about who it is for.
However it is implemented, it is audio objects that are adding all this value, and you can’t turn round without bumping into one. Immersive audio presentations and spatial technologies are so familiar that many speaker-agnostic technologies have become household names. We can’t get enough of them.
In the cinema, broadcast, and home theater spaces, consumers can get wrapped up in Dolby Atmos, DTS:X and Barco’s AuroMax presentations. Meanwhile, live sound and exhibition spaces are also stepping up to the mark with their own object-based equivalents, such as L-Acoustic’s L-ISA and d&b audiotechnik’s Soundscape.
Then there are the more experimental Extended Reality – or XR – projects which deal with mixed and augmented reality content designed to enhance fan engagement and gaming presentations. In June 2025, the MAX-R project presented the culmination of thirty months of exploration into these technologies; the research covered a lot of ground and immersive audio played a big part by enabling synchronized audio and the formation of spatial audio groups.
And it’s going to get even more personal. The XR Sports Alliance (XRSA) is a consortium of developers, rights holders and media companies which is fast building momentum, growing its roster of member companies and accelerating the uptake of XR sports services by encouraging more collaboration between members. Founded by Accedo, Qualcomm Technologies and sports broadcaster HBS, the XRSA now includes the likes of Google, Lenovo, T-Mobile and a host of sports owners and media companies like Red Bull. At the same time, tech behemoth Apple is also pushing its Apple Vision Pro system, with dual-driver audio speakers positioned next to each ear delivering personalized sound in addition to what is going on around you.
When everything is stripped back, one thing is crystal clear, and that is that audio is absolutely central to the immersive experience. If the point of VR and XR is to give the consumer a more immersive experience, then spatial audio is fundamental. As Apple puts it in its Vision Pro marketing bumpf, “spatial audio makes sounds feel like they’re coming from your surroundings…(while)…audio ray tracing analyzes your room’s acoustic properties to adapt and match sound to your space.”
All Wrapped Up
Apple’s dedication to spatial audio shouldn’t come as a surprise; as we learned in article three of this series, alongside the likes of Dolby and Sony, they’ve been pushing spatial audio for years. In fact, it could be argued that consumer acceptance and uptake of 3D audio is largely down to Apple Spatial Audio, which uses Dolby’s object-based audio format on a massive amount of content across its Apple Music service.
Apple Spatial consumers receive an encoded Dolby Atmos format which replicates the location of each channel according to whatever audio hardware is being used to listen to the content, whether that is a 3D soundbar, a full immersive speaker system or, in all likelihood, a pair of Apple AirPods.
While that’s exactly the point of all object-based audio experiences, the packaged delivery wrapping might differ as all these services adopt codecs to transfer signals from a production environment to an encoder. That encoder compresses and transfers multi-channel content, as well as adding metadata which describes what each channel is and whereabouts in the soundfield it lives.
Apple Spatial Audio uses its own proprietary renderer to process a Dolby Digital Plus Atmos mix, but in broadcast the two codecs that have the most widespread adoption are Dolby’s AC-4 and MPEG-H Audio, part of Fraunhofer’s MPEG-H suite of standards. Both these standards have stolen the march on DTS:X, another very capable spatial codec owned by Xperi, but which looks to have less support from device manufacturers, OEMs and OTT streaming services; LG and Samsung are both high-profile manufacturers who have ended support for DTS:X.
The Same But Different
Not so for Dolby AC-4 and MPEG-H, both next-generation audio (NGA) codecs designed to deliver immersive, interactive, and efficient audio for broadcast, streaming, and OTT platforms. Both are very well bedded in and well understood. Both are channel- and object-based codecs that support immersive formats; both support personalization; both are supported by broadcast standards like ATSC 3.0 and DVB. But they are not quite the same.
Developed by Germany’s Fraunhofer IIS research institute, MPEG-H Audio was standardized in 2015 and is custom-designed for delivering format-agnostic object-based audio. It is included in the ATSC, DVB, TTA (Korean TV) and SBTVD (Brazilian TV) standards.
As an open and free-to-use technology it has been widely adopted not only by OTT service providers and broadcasters, but by industry partners and consumer technology companies like LG, Samsung, Sennheiser and Sony, who have all developed products with MPEG-H support.
Meanwhile, Dolby’s AC-4 also boasts specifications for broadcast formats like DVB and ATSC 3.0. It is not an open technology and remains firmly licensed to Dolby, but it too was developed specifically for application in content delivery services like broadcast and streaming, and it too supports NGA features such as immersive and personalized audio. One big take-home feature for AC-4 is a vastly improved compression efficiency that in turn brings the bitrate cost down; AC-4 provides an average of 50% higher compression efficiency than Dolby Digital Plus.
The Data Is In
We’ve mentioned personalization a few times, and so perhaps now is a good time to dig into that. Because object-based audio is not just about spatial audio. In fact, object-based audio opens the door to NGA content that can deliver more immersion than spatial audio alone because it can specifically cater to everyone on the planet.
The Forum for Advanced Media in Europe (FAME), which is a cross-skilled industry body, described this aspect of NGA as not about being “more” or “better”, but that it offers different workflow and distribution options as well as enabling “new, more flexible personalized user experiences.”
In other words, it isn’t about making things bigger. It’s about making things more accessible and giving viewers more control over what they hear. Today’s audiences consume content wherever they are, and on whatever device they have to hand, and because object-based media treats individual audio elements as separate objects it’s much easier to deliver different audio mixes to suit different environments, people and devices.
For today’s modern consumers, and in particular for those Gen Z and Gen Alpha consumers who have little interest in traditional linear broadcasting, these things make a big difference. More fundamentally, in addition to people with hearing and visual impairments, about 20% of the global population are thought to be neurodivergent, and conditions like autism, ADHD and dyspraxia all present their own unique sensory needs. What if there was a way to develop content that caters for everyone?
Codecs like AC-4 and MPEG-H Audio not only encode and decode audio to reduce file size for transportation, but they also strip in metadata which not only describes where the channel sits in the soundfield, but also what it is. This gives content providers much more scope to deliver more appropriate content to all these groups.
Making It Personal
Because object-based media treats individual audio elements as separate objects, it allows the program content to flex according to the requirements of each individual audience member. As we know, objects aren’t just audio elements existing in space; they can be anything. By breaking down media into separate objects and attaching meaning to them through their metadata, an object can be controlled not only by the mixing engineer but also by the end user.
That means that if the crowd is designed by the production process to be an object, the end user can theoretically attenuate the crowd to better hear the commentary, or isolate just the commentary, or get rid of it altogether. What about listening to commentary in other languages? Or listening to the crowd atmos for the home or away fans?
All these options deliver more personalized experiences to consumers. As we discovered in article three, thanks to technologies like ADM and S-ADM which also support channel-based audio, delivery of all these personalized objects can be done alongside existing, channel based content, including any immersive or spatial mixes. And the wider industry looks to be in step with all these developments; the SMPTE 2110-41 Fast Metadata standard was published in 2024 and enables codecs like Dolby AC-4 and MPEG-H in ST 2110 broadcast networks.
Everything is becoming more joined up and the desire is definitely there, although the most likely adopters at the consumer end are the streaming, VR and XR services as they have more flexible control of the UI.
Kitted Out
All these things also have a knock-on effect at the other end of the production chain, requiring audio professionals to rethink how they approach the mix and all the extra processing power, control and monitoring they need to adapt to an increased number of outputs.
Thankfully, technology developers are ahead of the curve, and in the next article we will look at how vendors are developing kit – and employing AI – to simplify production and ensure that broadcasters are able to produce live content that makes full use of objects.
Supported by
You might also like...
Standards: Audio - MPEG Layer 3 Audio Coding (MP3)
Launched in 1995, MP3 remains one of the most ubiquitous audio formats in the world. This guide explains how psychoacoustic compression works, explains the differences between MPEG-1 and MPEG-2 implementations, and finds out where MP3 works – and where it doesn’t.
Production–Delivery Convergence: Part 8 - Why Informed Creativity Is A Competitive Advantage
Every creative decision in the streaming economy has a direct impact on multiple parts of the production and delivery chain. It means media organizations can no longer work in silos, and in this final part we examine how understanding the…
The Struggle With Generated Content
Like every other industry on the planet, broadcasting is struggling to strike its own balance over AI generated content. In the first of two articles we discuss the challenges facing broadcasters and how digital forensics, online services, and the big…
Standards: Audio - Standards For Audio Coding
Audio coding demands very different tools and workflows to video, but the same fundamental principles around quality apply to both. This guide surveys the standards, codecs and container formats you need to navigate modern audio workflows.
Broadcast Standards – The Science Of AI
Artificial Intelligence is already an integral part of our everyday lives and it is already making our lives more productive. But it is far from risk-free.