Where Are We And What Is Next For Audio?

Kevin Emmott shares a personal perspective on what has happened in Audio in 2022 and the new technologies which might drive what we hear next.

More articles about Immersive Audio:

We’re All Looking For A Shared Experience, But What Does It Sound Like?

The biggest and best live broadcast events have always brought people together; royal marriages, cup finals, Super Bowls, state funerals, music festivals; they all create a shared experience which unites us.

Although traditional over-the-air delivery is no longer the dominant delivery model, the same principals apply to live OTT programming which reaches across all international borders. This year it continued to increase market share with streaming channels having more access to traditional sports, while the relentless rise of eSports united millions of live global viewers across Twitch and YouTube.

Viewing habits are also developing; we no longer have to be in the same room to have a shared experience or feel we are part of something bigger, and our technology reinforces it. We’ve never had so many ways to communicate with each other – which considering what we’ve all been through in the last few years, is just as well.

And if we are already in this paradigm, then why not make it as captivating as possible?

Audio is at the heart of all this – in fact, it is the key to unlocking it, not only for immersion but for connection too, and in the next few years audio will be doing all the running. It will not only add value, but it will change people’s viewing experiences and bring people together in new ways.

This year we’ve seen a number of ground-breaking ways that broadcast audio can enrich our lives, across both OTT and OTA channels, and it has nothing to do with the cloud.

Consumer Technology

With such easy access to data we are already at a point where we can enjoy live content wherever we are. We can watch the Crufts final at a swimming pool if we want to, and that’s before widespread adoption of 5G.

Consumer tech is also making things like immersive sound much more accessible. 3D soundbars can be bought in a supermarket, while Sky Glass television already has Dolby Atmos built in. For the rest of us there is plenty of accessible immersive content which uses binaural and HRTF techniques to virtualize 3D sound in our headphones. Both Tidal and Apple Music provide tens of thousands of immersive titles, and Dolby is supported by many of the biggest streaming services like Netflix, Amazon Prime, HBO Max and Apple TV+.

Meanwhile, live immersive content is everywhere. Over 100 events at the Tokyo and Sochi Olympics were broadcast in immersive audio, and most new audio control rooms across both broadcast and music production are being designed for immersive monitoring.

Consumers understand it and crucially they don’t have to run cables all over their front rooms to experience it. The demand is there like it never was for surround.

At the other end of the chain, the availability of consumer technology from non-traditional broadcast suppliers has democratized content production, and channels like Tik Tok, YouTube and Facebook are providing exposure to talented content producers who are launching media careers through non-traditional channels. This breeds and encourages creativity, and it is a wonderful thing.

All this provides access to better audio content, across more channels. The proofs of concept have already been proven – expect people to get creative.

Personalization

Immersive audio is one strand of Next Generation Audio (NGA) which has had significant coverage in 2022, but there are others which have come on in leaps and bounds. NGA is an umbrella term for several mixing techniques and services, and personalization is another which has made headlines.

To paraphrase the EBU’s own document on NGA, personalization isn’t about making programmes better. It’s about providing more flexible and personalized user experiences; in other words, making content more accessible.

This year personalization reached a milestone with a live end-to-end demo at IBC that took stems from a football match, encoded them into MPEG-H and transmitted them to an MPEG-H compliant set top box. It gave visitors the opportunity to listen to three output programmes: a standard TV mix, an enhanced dialogue mix and a mix with no-commentary at all.

The ability to treat audio as independent objects within a broadcast stream and transmit it as independent, loudness-compliant mixes would make a significant difference to millions of people with hearing or visual impairments, providing end-user control over a full commentary mix, a crowd mix, a different language or an audio description.

It works, and compatible chipsets are already installed in OEM TV equipment all over the world. It seems like more accessible programming is just around the corner.

Movement And Space

We’ve already touched on it, but there is another aspect of immersive audio which is – if you will excuse the pun – turning heads.

Dynamic head tracking has been popularized by the Apple Spatial Audio service, and it has enormous potential for development. Head tracking enables the soundscape to move as the user moves; Apple does it by not only decoding the Atmos content, but also tracking users’ head movements with accelerometers and gyroscopes in its devices, which include Apple AirPods and Beats headphones.

This provides the illusion of listening to the audio relative to the screen – move your head and the audio tracks with you in 360°. But no longer is it the only game in town. Qualcomm’s Snapdragon 8 Gen 2 mobile platform launched in 2022 and also supports spatial audio with dynamic head tracking, and has already been picked up by multiple manufacturers for Android devices, which may see Apple’s dominance challenged over the next few years.

Dynamic head-tracking has also been trialled in a live broadcast environment this year. It was a key element of the 5G Edge-XR project led by BT Sport which won the Content Everywhere Award at this year’s IBC exhibition. The project explored how augmented and virtual reality immersive experiences could be broadcast to consumer equipment like smartphones. It used cloud-GPU (okay, I lied about cloud) to render extended reality presentations over 5G networks.

The concept tied three 360° cameras to accompanying B-format immersive microphones to create an immersive audio and visual presentation of a football match. The presentation mixed in match commentary as well as gameplay from the pitch, and each camera position created a 5.1.4 immersive mix. When the viewer moved their head, the crowd audio moved accordingly, but the commentary didn’t move and pitch objects stayed where the pitch was located.

It adds to the collective experience in a very physical way. The duality of spatial audio is that it pulls us deeper into the audio at the same time as connecting us with others. Let’s have more of it.

Beyond Audio

So far we’ve talked about immersive audio, personalization and spatial audio, but what about things that go beyond our hearing? What if audio can be used for more than just listening? What if it can be used to learn more about yourself?

One of the strands of neuroscience examines how sound affects the brain, and companies are looking to leverage this to create more engaging ways of listening. With headphones and other wearables (such as smartwatches) now ubiquitous, neurotech companies are looking at how to make wearables deliver more, such as harvesting brain activity to monitor engagement.

Other companies are providing ways to make devices capable of running third party apps, which are helping to speed this integration and could lead to headphones running independent DSP algorithms to provide a number of different services: one developer is already working on an algorithm which produces neuromarkers from EEG signals to determine what a user is focusing on in the sound mix.

The Future’s Bright, But It Sounds Fantastic

There’s a well-trodden cliché that it is sound design that triggers emotion in visual content, and over the next few years many of the advances we see in broadcasting won’t be seen at all, but they will definitely be heard.

Whatever happens over the next few years, it’s all going to sound different and it’s probably going to sound amazing. But more than anything else, it will help us experience a shared connection, wherever we are.

You might also like...

Microphones: Part 11 - The State Of The Art… And The Potential Of MEMS Microphone Arrays

Here we look from the state of the art in microphones, to what the future may bring with the enticing theoretical potential of microphone arrays built using MEMS technology.

Microphones: Part 10 - Mid-Side (M-S) Recording And Processing

M-S techniques provide useful sound-field positioning and a convenient way to check mono compatibility. We explain the hard science behind this often misunderstood technique.

Microphones: Part 9 - The Science Of Stereo Capture & Reproduction

Here we look at the science of using a matched pair of microphones positioned as a coincident pair to capture stereo sound images.

Microphones: Part 8 - Audio Vectorscopes

The audio vectorscope is an excellent tool for assuring quality in stereo sound production, because it makes the virtual sound image visible in the same way that a television vectorscope allows the color signals to be seen.

Microphones: Part 7 - Microphones For Stereophony

Once the basic requirements for reproducing sound were in place, the most significant next step was to reproduce to some extent the spatial attributes of sound. Stereophony, using two channels, was the first successful system.