Could personalized HRTFs be the next big step change for both the consumer and professional immersive headphone experience?
HRTF stands for Head Related Transfer Function and, simply put, is a catch-all term for the characteristics a human head imparts on sound before it enters the ear canal. Everything from level tonal changes caused by our head, shoulders, and pinna (external ear parts), to arrival-time differences (Interaural Time Difference, or ITD) between the two ears have an effect on our perception of the direction and distance of sources.
It’s a concept that explains the necessity of headphones with binaural sound, for example. That is, if you record a source by sticking two microphones in your ears, that recording will incorporate your HRTF, both considering the direct sound and room reflections. If you then play that back through speakers, the HRTF effect becomes a disadvantage because of pronounced coloring and new room reflections clashing against the recording. The source would have to be replayed through headphones to avoid the HRTF effect being imparted a second time.
A Place of Your Own
Part of the issue with immersive audio reproduced through speakers in a space is the effective localization of sources within that space. With object-based reproduction or with wave field synthesis you can approximate actual source position, but in the end it all gets injected into your ear canal after processing by your own HRTF. Therefore, a binaural source over headphones should be capable of producing the ultimate immersive experience.
However, everyone has their own personal HRTF. Our aural perception filter is as personal as a fingerprint. A generic binaural signal such as might be recorded with a ‘dummy head’ microphone will be a good approximation, but to a certain extent it will always be like looking through someone else’s spectacles.
What if you could easily measure and define your own HRTF? That could then be used by rendering engines to produce a personalized binaural feed from any source – including the most extreme object- and scene-based immersive formats. Set-top boxes, sound cards, games, mixing console monitoring sections, and DAWs could all incorporate rendering engines based on personalized HRTFs.
The SOFA file, or ‘Spatially Oriented Format for Acoustics’, is a general-purpose file format for storing spatial acoustic data, standardized by the AES as ‘AES69’. The data does not only have to be a HRTF but could be applied to a specific listening position in a room or for modelling a full acoustic response of a concert hall at various positions, for example.
The data is made up of multiple impulse responses –a representation of how a given input is changed at an output. In the case of measuring HRTF, each impulse response represents a measurement for each ear, from a particular direction that is defined with elevation and azimuth. Therefore, to measure an HRTF with microphones you need to take enough responses to adequately represent the full source sphere around a test subject.
How many responses is enough? Well, this method of modelling and quantifying HRTFs is not new and the University of California, Davis’ CPIC Interface Laboratory’s HRTF Database has been in existence for some time with a compiled library of HRTFs where each one is made up of 1250 directional readings for each ear of the subject. However, numbers of readings in the 200 region are more common, such as for the Listen Library, which was a joint project between microphone and headphone manufacturer AKG, and IRCAM (Institute for Research and Coordination in Acoustics/Music).
Thankfully, an alternative to sitting in an anechoic chamber for several hours is here… Genelec recently announced its new Aural ID process for modelling an individual’s HRTF and compiling that into a SOFA file that does not involve sticking microphones in your ears.
The idea is to create each model from a 360-degree video of the head and shoulders of each customer that can be acquired simply on a high-quality mobile phone.
Simplified HRTF: a couple of HRTF aspects that help determine source direction. HRTF is more complicated than this though, as it uses the entire upper torso and acts in three dimensions where both angle and azimuth are relevant.
That video is uploaded to the Genelec web-based calculation service, which builds a virtual 3D model, including especially detailed modelling of the pinna. This model is put into a full wave analysis of the HRTF using lots of virtual sources from many angles, which in turn generates the full HRTF data and the SOFA file.
Once you have your own personal HRTF data, a rendering engine can personalize any sound reproduction specifically for your headphones, bringing stereo and immersive content straight to your ear canals, and missing out those pesky monitors.
Of course, the monitors themselves, the room they are in, head movements, and other people listening with you have such a significant effect on a social listening experience that Aural ID is unlikely to spell the end of monitors just yet (something Genelec is no doubt pleased about), but this technology does have some significant practical applications and advantages in both consumer and professional worlds.
Immersive games should get a big reality boost for a start, and if mixing on headphones is necessary, it won’t be such a hit-and-miss affair if your DAW or console headphone output can model stereo, surround, and immersive experiences comparable to loudspeaker reproduction at the touch of a button.
The Aural ID service should be available from Genelec very soon.
The SOFA file format is already in use in game development and is specified as the format of choice for Steam Audio from Valve Corporation, for example - a solution for developers that integrates environment and listener simulation.
Personalized HRTFs can be loaded into the Unity, FMOD, Unreal, and C environments, so expect to be able to load you Aural ID into your favorite VR game in the not-too-distant future...
A Head Related Future
In the creative space, you could argue that awareness of HRTF and its effects could inform mixers and engineers to an extent, particularly in narrative audio and effects for film and TV, for example. But because of the issues around headphones versus monitors, and the complications in generating content for every eventuality, history has generally settled for ignoring the HRTF principals, choosing to mix on monitors and leave everything else to take care of itself. Binaural productions have tended to be niche products because translation has been best assured using in-room monitoring.
However, listening habits are changing and more people are putting on headsets and consuming content as a personal experience. Real time rendering of a binaural experience from immersive source material is already happening and will be completely relevant to how we approach broadcast audio production in the future.
You might also like...
NDI (Network Device Interface) is a free protocol for Video over IP, developed by NewTek. The key word is “free.”
NAB have announced the show scheduled for October 2021 has been cancelled.
Violent weather storms are wreaking havoc on the East Coast of the U.S. and radio and TV stations there are struggling to get the life-saving news out. In the past two months alone storms have knocked out TV antenna…
Timing accuracy has been a fundamental component of broadcast infrastructures for as long as we’ve transmitted television pictures and sound. The time invariant nature of frame sampling still requires us to provide timing references with sub microsecond accuracy.
The need for synchronization rears its head in so many different endeavors that it has to be accepted as one of the great enabling technologies.