The world’s first immersive audio format is getting a new generation of consumers hooked via gaming, VR and ASMR, while both production and consumer technology is making it easier to record and easier to access.
Binaural audio is one element in the growth of Next Generation Audio (NGA), an umbrella term which covers initiatives like Object-based audio (OBA), 3D audio and other techniques which enhance the user experience.
According to the UK communications regulator Ofcom, “Next Generation Audio, which allows the combination of Objects, Channels, and other audio formats such as ambisonics and binaural audio to be carried, is one of the recent success stories of the Object-based approach. Next Generation Audio is now present in at least 4 out of 5 new UHD TV sets in Europe and the UK, and most new mobile phone models. International standardization is also mature in this area with DVB, HbbTV and ATSC3.0 having already provided NGA specifications. Codecs for the carriage of Next Gen Audio include MPEG-H, Dolby AC-4 and DTS-X.”
As ways of managing and delivering audio objects in these 3D soundscapes develop, consumers are benefiting from more immersive audio narratives.
Of all these formats, binaural audio has been around the longest, with a history dating back to 1881. Binaural audio is a 2-channel format which gives the impression of an immersive environment; in simple terms it takes advantage of the physicality of the human head and ears, with two omnidirectional mics placed at the inner ears. This mimics the mechanics of hearing as the sound arrives at the microphones at different times and is also influenced by the head and outer ear, giving the listener a sense of the location of the sound.
Recordings are often made on dummy heads like the Neumann KU100, the classic binaural dummy head microphone which everyone pictures when they think about binaural recordings, or on microphones which are incorporated into headphones and earbuds from companies like Sennheiser and DPA.
There are several advantages. Binaural recording at source means that content can be recorded quickly and cost effectively and does not require any additional processing. Another approach is to binauralize sources during production.
Matt Firth is a Project Engineer in the Audio Team at BBC R&D
Matt Firth is a Project Engineer in the Audio Team at BBC R&D with a particular focus on Audio Production, and has worked on a number of binaural projects, including the BBC’s binaural coverage of the Proms.
“As binaural is a 2-channel mix, delivery is really simple. Unlike a stereo mix, where most of the mix is panning either left or right based on level difference, binaural takes positional information for each feed and applies a filter. These filters replicate the cues we use to identify sound source locations - when we hear a sound our head and ears filter the sound in different ways depending on direction of arrival, and those filters mimic that effect.
“This is one of the benefits of binaural audio – it provides an immersive experience for consumers with a simple 2-channel mix. It means binaural mixes can use standard delivery pipelines; a 2-channel mix is easy to create and there are no changes to existing infrastructures. And all consumers need to hear it is a pair of headphones.”
This is one of the reasons why the format is making a comeback; headphones are what most of us use to listen to audio, with connected devices in everyone’s pocket, and content on demand wherever you are. Binaural is perfect for headphones.
“There is a huge appetite for binaural content,” says Eloise Whitmore, Managing Director of Manchester-based audio production company Naked Productions, and an award-winning Sound Designer with a long history of working with the binaural format for radio drama.
“More and more listeners are asking for it and as a result more binaural content is being commissioned. That said, binaural works best if it is used sparingly. The brain is easily fooled by binaural for a short time, so we find it works better if it’s used as part of a stereo presentation; in radio drama, binaural content works very well for jump scares and particularly whispers.”
Naked Production’s sound design for Sour Hall is a good example of this is. The six-part play was mixed in stereo but utilized 2-channel binaural audio for suspenseful scenes, such as when the “Boggart” character whispers in your ear.
Tony Churnside and Eloise Whitmore listening to Dummy head on set for Sour Hall - photo credit Simon Bray.
“Whatever you are creating you need to be aware of what’s on the other side of the platform because that is the channel format you need to work with; radio work is largely stereo broadcasting,” adds Tony Churnside, who also worked on sound design for Sour Hall and who works alongside Whitmore.
“Binaural is often a compromise; it works well for background sound beds, but for channel-based content, stereo is often better for the overall listener experience. One thing which you can’t replicate with a dummy head mic is the way humans make slight adjustments to assess which direction sound is coming from; that can’t be recreated with a static dummy head, so over longer presentations binaural recordings aren’t always the best option.”
Despite the obvious benefits for production and distribution, the lack of additional information for sound source localization points to one of the reasons why binaural has never quite made it into the big leagues, but there is a more fundamental reason why this might be the case, according to Churnside.
“The Neumann head is based on an average sized human head, and we use this a lot when we make binaural recordings,” he says, “but with binaural sound, head size and shape make a big difference.”
This is one of the biggest downsides of binaural recordings - everyone filters sound differently, and things like head size and ear shape have an effect. Binaural recording devices like the Neumann dummy head, or even in-ear microphones, are based on an individual head size and shape.
Dummy heads are largely based on average sized human heads, and binaural renderers in post-production DAWs like Pro Tools apply generalized filter sets based on the average person.
And that is the issue here; there is no such thing as an average person. This means that everyone’s experience of binaural audio is different. One person might get a perfect, immersive, binaural experience, another will not. The experience can never be deterministic.
It can also be time consuming; “The majority of our sound libraries are in stereo, so it means that binaural effects have to be made from scratch and recorded on a dummy head or spatialized in post. This is time consuming and adds cost to any production,” says Whitmore.
But the appetite for immersive audio formats is still strong, and technology is helping deliver new immersive formats. And as that desire for more immersive audio continues to grow, the way listeners consume content allows for multiple versions to be created.
Whitmore and Churnside have been working with various audio formats to create unique listening experiences for many years, from fully immersive audio installations at the MOMA in New York for Bjork, to the Vostok-K incident, an experimental spatial audio presentation which uses listener’s phones and laptops to create an immersive listening experience in a listener’s home environment.
Whitmore claims that because digital and on-demand channels can make use of alternate versions, sound designers have a greater opportunity to push the boundaries of sound design and create alternative mixes for radio programming.
This growing consumer appetite is in tandem with the development of production technology, as well as production and delivery models. The Audio Definition Model (ADM) is a metadata model for authoring NGA content which is standardized by the International Telecommunication Union and is a collaborative project with the involvement of a number of global broadcast partners.
“ADM provides the metadata to describe NGA content,” says Firth. “It doesn’t tie producers into any codec, output format or a specific ecosystem. It is an open standard, codec-agnostic and does not define the delivery process. In this way, it enables the content to be delivered via different NGA codecs for user personalization, and for rendering to any output format such as stereo, surround formats, or even binaural. It can also support existing delivery methods through channel-based pre-rendering without any additional production effort.”
Industry-led initiatives like ADM guarantee the future of NGA and authoring tools such as the EAR Production Suite are becoming available which companies like Naked Productions can use to further enhance the listener experience.
Coupled with more support for evolving delivery codecs, as well for consumer devices which support spatial audio like Apple’s AirPod Pro and soundbars that bounce immersive signals around the room, it is getting easier to get NGA content in front of consumers.
You might also like...
For the past 15 years, Chris Shepard, chief engineer and owner of American Mobile Studio, has been responsible for the music mixes broadcast over a variety of streaming platforms from some of the biggest festivals in the United States, including Coachella,…
The trouble with Next Generation Audio is its versatility and the wide array of devices which need to deliver the enhanced immersive experience for the consumer. The Audio Definition Model may hold the key.
Maintaining controlled access is critical for any secure network, especially when working with high-value media in broadcast environments.
People visit NAB Shows for many reasons. Some are there to investigate and examine new solutions. Some are shopping with a budget ready to spend. Others visit to gather ideas and figures for next year’s budget. Many visit to a…
Like most equipment now being marketed to broadcasters these days, audio consoles continue to improve in the area of audio over IP (AoIP) networking and remote production workflows. Indeed, efficiency, flexibility, and remote as well as distributed production are at…