The Sponsors Perspective: The Personal HRTF - An Aural Fingerprint

HRTF stands for Head Related Transfer Function and, simply put, is a catch-all term for the characteristics a human head imparts on sound before it enters the ear canal. Everything from level tonal changes caused by our head, shoulders, and pinna (external ear parts), to arrival-time differences (Interaural Time Difference, or ITD) between the two ears have an effect on our perception of the direction and distance of sources.


This article was first published as part of Essential Guide: Immersive Audio Pt 1 - An Immersive Audio Primer

It’s a concept that explains the necessity of headphones with binaural sound, for example. That is, if you record a source by sticking two microphones in your ears, that recording will incorporate your HRTF, both considering the direct sound and room reflections. If you then play that back through speakers, the HRTF effect becomes a disadvantage because of pronounced coloring and new room reflections clashing against the recording. The source would have to be replayed through headphones to avoid the HRTF effect being imparted a second time.

A Place of Your Own

Part of the issue with immersive audio reproduced through speakers in a space is the effective localization of sources within that space. With object-based reproduction or with wave field synthesis you can approximate actual source position, but in the end it all gets injected into your ear canal after processing by your own HRTF. Therefore, a binaural source over headphones should be capable of producing the ultimate immersive experience.

However, everyone has their own personal HRTF. Our aural perception filter is as personal as a fingerprint. A generic binaural signal such as might be recorded with a ‘dummy head’ microphone will be a good approximation, but to a certain extent it will always be like looking through someone else’s spectacles.

What if you could easily measure and define your own HRTF? That could then be used by rendering engines to produce a personalized binaural feed from any source – including the most extreme object- and scene-based immersive formats. Set-top boxes, sound cards, games, mixing console monitoring sections, and DAWs could all incorporate rendering engines based on personalized HRTFs.

Enter SOFA

The SOFA file, or ‘Spatially Oriented Format for Acoustics’, is a general-purpose file format for storing spatial acoustic data, standardized by the AES as ‘AES69’. The data does not only have to be a HRTF but could be applied to a specific listening position in a room or for modelling a full acoustic response of a concert hall at various positions, for example.

The data is made up of multiple impulse responses –a representation of how a given input is changed at an output. In the case of measuring HRTF, each impulse response represents a measurement for each ear, from a particular direction that is defined with elevation and azimuth. Therefore, to measure an HRTF with microphones you need to take enough responses to adequately represent the full source sphere around a test subject.

How many responses is enough? Well, this method of modelling and quantifying HRTFs is not new and the University of California, Davis’ CPIC Interface Laboratory’s HRTF Database has been in existence for some time with a compiled library of HRTFs where each one is made up of 1250 directional readings for each ear of the subject. However, numbers of readings in the 200 region are more common, such as for the Listen Library, which was a joint project between microphone and headphone manufacturer AKG, and IRCAM (Institute for Research and Coordination in Acoustics/Music).

Aural ID

Thankfully, an alternative to sitting in an anechoic chamber for several hours is here… Genelec recently announced its new Aural ID process for modelling an individual’s HRTF and compiling that into a SOFA file that does not involve sticking microphones in your ears.

The idea is to create each model from a 360-degree video of the head and shoulders of each customer that can be acquired simply on a high-quality mobile phone.

Simplified HRTF: a couple of HRTF aspects that help determine source direction. HRTF is more complicated than this though, as it uses the entire upper torso and acts in three dimensions where both angle and azimuth are relevant.

Simplified HRTF: a couple of HRTF aspects that help determine source direction. HRTF is more complicated than this though, as it uses the entire upper torso and acts in three dimensions where both angle and azimuth are relevant.

That video is uploaded to the Genelec web-based calculation service, which builds a virtual 3D model, including especially detailed modelling of the pinna. This model is put into a full wave analysis of the HRTF using lots of virtual sources from many angles, which in turn generates the full HRTF data and the SOFA file.

Once you have your own personal HRTF data, a rendering engine can personalize any sound reproduction specifically for your headphones, bringing stereo and immersive content straight to your ear canals, and missing out those pesky monitors.

Of course, the monitors themselves, the room they are in, head movements, and other people listening with you have such a significant effect on a social listening experience that Aural ID is unlikely to spell the end of monitors just yet (something Genelec is no doubt pleased about), but this technology does have some significant practical applications and advantages in both consumer and professional worlds.

Immersive games should get a big reality boost for a start, and if mixing on headphones is necessary, it won’t be such a hit-and-miss affair if your DAW or console headphone output can model stereo, surround, and immersive experiences comparable to loudspeaker reproduction at the touch of a button.

The Aural ID service should be available from Genelec very soon.

The SOFA file format is already in use in game development and is specified as the format of choice for Steam Audio from Valve Corporation, for example - a solution for developers that integrates environment and listener simulation.

Personalized HRTFs can be loaded into the Unity, FMOD, Unreal, and C environments, so expect to be able to load you Aural ID into your favorite VR game in the not-too-distant future...

A Head Related Future

In the creative space, you could argue that awareness of HRTF and its effects could inform mixers and engineers to an extent, particularly in narrative audio and effects for film and TV, for example. But because of the issues around headphones versus monitors, and the complications in generating content for every eventuality, history has generally settled for ignoring the HRTF principals, choosing to mix on monitors and leave everything else to take care of itself. Binaural productions have tended to be niche products because translation has been best assured using in-room monitoring.

However, listening habits are changing and more people are putting on headsets and consuming content as a personal experience. Real time rendering of a binaural experience from immersive source material is already happening and will be completely relevant to how we approach broadcast audio production in the future.

Supported by

You might also like...

Virtual Production At America’s Premier Film And TV Production School

The School of Cinematic Arts at the University of Southern California (USC) is renowned for its wide range of courses and degrees focused on TV and movie production and all of the sub-categories that relate to both disciplines. Following real-world…

How Starlink Is Progressing As An Alternative To 5G

TV stations have mostly parked their satellite trucks and ENG vans in favor of mobile bi-directional wireless digital systems such as bonded cellular, wireless, and direct-to-modem wired internet connections. Is Starlink part of the future?

Virtual Production At IBC 2023

After several years of experimentation, movies and TV shows shot on LED wall based virtual production sets are quickly becoming the best and most economical way to create stunning virtual environments. Large stages and even mobile trailers have been outfitted…

Researchers Building Next-Generation Virtual Infrastructure Based On 6G Wireless Networking

While commercial deployments of 5G networks are steadily increasing, many commentators predict that the rise of more immersive communication, holographic telepresence, and social experiences powered by Extended Reality (XR)… IE ‘the metaverse’, will create vast amounts of generated data and…

Virtual Production Set To Be A Major Theme At 2023 NAB Show

In the area of virtual production, the times have certainly changed. From the early days of shooting against a green screen and compositing the image in real time, the biggest productions are now using large “volume” stages where actors are fil…