The Sponsors Perspective: Immersive Audio Monitoring

Genelec Senior Technologist Thomas Lund moves the monitoring discussion on to the practical considerations for immersive audio, wherever you are.

This article was first published as part of Essential Guide: Immersive Audio Pt 4 - Options And Tools For Production Of Live Immersive Content

The previous article concluded how an in-room immersive monitoring system generally is the best choice for ensuring good translation, not only to other loudspeakers and rooms, but also to soundbars and to personalised immersive headphones in the future. This article examines more practically what it takes to design, setup, and control an immersive monitoring system.

Immersive production in 2019 generally falls into three categories: 

  1. Sound for picture and a large audience.
  2. Sound for picture and a small audience.
  3. Sound for pleasure and a small audience.
Two of the most commonly used ITU-R BS.2159 immersive system configurations: 7.1.4 (or 11.1) on the left, and 22.2 on the right.

Two of the most commonly used ITU-R BS.2159 immersive system configurations: 7.1.4 (or 11.1) on the left, and 22.2 on the right.

While everybody knows how engaging immersive cinema can be - and ambitious sports, drama from HBO, Netflix etc. has started to envelop us sometimes even better at home - the third type is also becoming more popular: Stunning immersive recordings of just music. Such production is now happening at a number of diverse locations: Austria, China, Germany, Japan, Korea, Norway, UK, and counting.

Morten Lindberg from 2L puts it this way: “There is no method available today to reproduce the exact perception of attending a live performance. That leaves us with the art of illusion when it comes to recording music. We should create the sonic experience that emotionally moves the listener to a better place. Immersive audio is a completely new conception of the musical experience. Recorded music is no longer a matter of a fixed one- or two-dimensional setting, but rather a three-dimensional enveloping situation.”

System Design

Before designing an immersive listening room, consider which of the three above scenarios you wish to cover. The biggest difference is between targeting a large or a small audience. The former is generally based on “shower type” top layer loudspeakers and irregular listening distances, while the other two situations are better served using an ITU-R BS.2159 configuration.

In BS.2159, the ideal is an equidistant setup of all monitors, though impossible placement may be compensated by delay-alignment instead. At the design-stage, listening level and system headroom needs consideration. EBU R128 includes a simple and elegant requirement: Each monitor should generate 73dB SPL at the listening location, for a test signal input of -23 LUFS. With 20dB of headroom, each monitor should therefore be able to deliver a minimum of 93dB SPL.

ATSC A/85 specifies listening level based on room size, from 76dB SPL to 82dB SPL in rooms from below 40m3 to 560m3, i.e. requiring between 96dB and 102dB SPL per monitor.

For a theatrical monitoring system, SPL requirements as per SMPTE RP 200 are even higher, with standard operating level of front channels at 85dB SPL. Dolby has created a useful DARDT tool for design of Atmos theatrical rooms, and a DARDT HE tool for design of Atmos home entertainment rooms. Both tools inform whether a certain monitor is of sufficient capacity, taking room and distances into account. The list of monitors includes many Genelec types, and models from other vendors, too.

Low frequency (LF) reproduction depends on application. Sound for picture makes use of a dedicated sub channel with 10dB of extra SPL capacity, called LFE, which is rarely used in pristine music production. In all applications, however, bass management may be applied, thus relieving the main monitors of LF duty to increase system headroom.

Graph showing range of exposure levels with standard 7.1.4 system levels indicated.

Graph showing range of exposure levels with standard 7.1.4 system levels indicated.

Considering pristine music, bass management is based on at least two subwoofers at different locations in order not to compromise imaging and envelopment. The same concept should ideally be applied in immersive sound for picture, additional to a subwoofer used for LFE reproduction.

Finally, when planning the listening level, the risk of hearing loss (HL) also needs to be taken into account. Sound pressure level (SPL) is a measure of sound power, while the more relevant metric in prevention of HL is sound energy, i.e. SPL with time. A-weighted sound pressure integrated with time is called sound exposure, and health requirements based on contemporary research recommend a sound exposure of not much more than 80dB per day (8 hours); which is the same as 83 dB for 4 hours or 86 dB for two hours etc.

That equal energy principle also applies when installing several loudspeakers in a room: For a certain calibration level, each doubling of loudspeakers tends to increase exposure by 3dB. Working for a day with 7.1.4. content may therefore easily give 10 dB more exposure than the calibration level suggests, so check your daily sound exposure once in a while to remain on the safe side.

System Calibration

There are six steps to credible immersive production monitoring:

  1. Select and optimize the room.
  2. Optimize placement of the monitors.
  3. In-situ frequency response calibration.
  4. Time of flight (delay) compensation.
  5. Trimming of spectral balance.
  6. Calibration of listening level.

High quality monitors should have been individually calibrated at the factory for a flat on-axis frequency response in an anechoic chamber, but the response is not the same once a real room is used. Furthermore, the response varies dramatically depending on placement in that room. Considering typical positioning of immersive monitors, there may often be frequency response differences of 18dB or more between devices of the same type. Such differences are reduced in step 3 above.

The goal is a flat direct-sound frequency response from each monitor, that can be measured objectively using a microphone at the listening position. In case the room is not well acoustically treated, it may be indicated instead to take measurements at three or four locations, 10-25 cm around the main listening position. More information can be found in [1].

After having achieved a flat in-room frequency response, it is time for a final trim, based on actual listening. For instance, monitors with highly controlled directivity, such as Genelec’s, generate more uniform and neutral reflections than monitors of less refined design. It takes a human to really compute direct sound with reflections, and frequency response trimming should be performed at a controlled listening level to reduce sensory variation. Depending on listening level, the difference in human sensitivity between 1kHz and 100 Hz can vary by 12dB or more (ISO 226). For that reason, subjective trimming should also be performed at a listening level no lower than 80dB(C), or the subject may not be able to sense the lowest frequencies at all.

Expert software like Genelec’s GLM application can assist with all steps in the list above, and by itself take care on steps 3-6. Once setup and calibration has been done, GLM doubles as the monitor controller needed in immersive production with solo and mutes, calibrated level, bass management, and switching between loudspeaker setups having up to 72 channels.


[1] A. Mäkivirta & T. Lund, “Is single microphone position enough for immersive system equalization and level calibration in production monitoring?” in proceedings of Tonmeistertagung, Cologne (2018).

Supported by

You might also like...

The Sponsors Perspective: Storage - How To Solve 5G’s Biggest Challenge

The arrival of 5G brings both opportunities and challenges to communications, media and entertainment companies, as well as the original equipment manufacturers (OEMs) working to support them.

The Sponsors Perspective: Mixing Realities - Feeding The Immersive Markets

Will alternative immersive channels create an imperative for broadcasters? Veronique Larcher, Director of AMBEO Immersive Audio, Sennheiser, explores immersive content outside of the commercial broadcast space, including virtual, augmented, and mixed realities.

Encoding Shines At Virtual IBC 2020

Had IBC 2020 taken place as usual, some of the liveliest discussions would have centered around encoding, after an eventful year leading up to the virtual event that took place over the same time slot.

The Sponsors Perspective: The Personal HRTF - An Aural Fingerprint

HRTF stands for Head Related Transfer Function and, simply put, is a catch-all term for the characteristics a human head imparts on sound before it enters the ear canal. Everything from level tonal changes caused by our head, shoulders, and…

Changing Architecture In The New IP World

The Cloud is the future of live TV production.