The Sponsors Perspective: Immersive Audio Monitoring

Genelec Senior Technologist Thomas Lund moves the monitoring discussion on to the practical considerations for immersive audio, wherever you are.

This article was first published as part of Essential Guide: Immersive Audio Pt 4 - Options And Tools For Production Of Live Immersive Content

The previous article concluded how an in-room immersive monitoring system generally is the best choice for ensuring good translation, not only to other loudspeakers and rooms, but also to soundbars and to personalised immersive headphones in the future. This article examines more practically what it takes to design, setup, and control an immersive monitoring system.

Immersive production in 2019 generally falls into three categories: 

  1. Sound for picture and a large audience.
  2. Sound for picture and a small audience.
  3. Sound for pleasure and a small audience.
Two of the most commonly used ITU-R BS.2159 immersive system configurations: 7.1.4 (or 11.1) on the left, and 22.2 on the right.

Two of the most commonly used ITU-R BS.2159 immersive system configurations: 7.1.4 (or 11.1) on the left, and 22.2 on the right.

While everybody knows how engaging immersive cinema can be - and ambitious sports, drama from HBO, Netflix etc. has started to envelop us sometimes even better at home - the third type is also becoming more popular: Stunning immersive recordings of just music. Such production is now happening at a number of diverse locations: Austria, China, Germany, Japan, Korea, Norway, UK, and counting.

Morten Lindberg from 2L puts it this way: “There is no method available today to reproduce the exact perception of attending a live performance. That leaves us with the art of illusion when it comes to recording music. We should create the sonic experience that emotionally moves the listener to a better place. Immersive audio is a completely new conception of the musical experience. Recorded music is no longer a matter of a fixed one- or two-dimensional setting, but rather a three-dimensional enveloping situation.”

System Design

Before designing an immersive listening room, consider which of the three above scenarios you wish to cover. The biggest difference is between targeting a large or a small audience. The former is generally based on “shower type” top layer loudspeakers and irregular listening distances, while the other two situations are better served using an ITU-R BS.2159 configuration.

In BS.2159, the ideal is an equidistant setup of all monitors, though impossible placement may be compensated by delay-alignment instead. At the design-stage, listening level and system headroom needs consideration. EBU R128 includes a simple and elegant requirement: Each monitor should generate 73dB SPL at the listening location, for a test signal input of -23 LUFS. With 20dB of headroom, each monitor should therefore be able to deliver a minimum of 93dB SPL.

ATSC A/85 specifies listening level based on room size, from 76dB SPL to 82dB SPL in rooms from below 40m3 to 560m3, i.e. requiring between 96dB and 102dB SPL per monitor.

For a theatrical monitoring system, SPL requirements as per SMPTE RP 200 are even higher, with standard operating level of front channels at 85dB SPL. Dolby has created a useful DARDT tool for design of Atmos theatrical rooms, and a DARDT HE tool for design of Atmos home entertainment rooms. Both tools inform whether a certain monitor is of sufficient capacity, taking room and distances into account. The list of monitors includes many Genelec types, and models from other vendors, too.

Low frequency (LF) reproduction depends on application. Sound for picture makes use of a dedicated sub channel with 10dB of extra SPL capacity, called LFE, which is rarely used in pristine music production. In all applications, however, bass management may be applied, thus relieving the main monitors of LF duty to increase system headroom.

Graph showing range of exposure levels with standard 7.1.4 system levels indicated.

Graph showing range of exposure levels with standard 7.1.4 system levels indicated.

Considering pristine music, bass management is based on at least two subwoofers at different locations in order not to compromise imaging and envelopment. The same concept should ideally be applied in immersive sound for picture, additional to a subwoofer used for LFE reproduction.

Finally, when planning the listening level, the risk of hearing loss (HL) also needs to be taken into account. Sound pressure level (SPL) is a measure of sound power, while the more relevant metric in prevention of HL is sound energy, i.e. SPL with time. A-weighted sound pressure integrated with time is called sound exposure, and health requirements based on contemporary research recommend a sound exposure of not much more than 80dB per day (8 hours); which is the same as 83 dB for 4 hours or 86 dB for two hours etc.

That equal energy principle also applies when installing several loudspeakers in a room: For a certain calibration level, each doubling of loudspeakers tends to increase exposure by 3dB. Working for a day with 7.1.4. content may therefore easily give 10 dB more exposure than the calibration level suggests, so check your daily sound exposure once in a while to remain on the safe side.

System Calibration

There are six steps to credible immersive production monitoring:

  1. Select and optimize the room.
  2. Optimize placement of the monitors.
  3. In-situ frequency response calibration.
  4. Time of flight (delay) compensation.
  5. Trimming of spectral balance.
  6. Calibration of listening level.

High quality monitors should have been individually calibrated at the factory for a flat on-axis frequency response in an anechoic chamber, but the response is not the same once a real room is used. Furthermore, the response varies dramatically depending on placement in that room. Considering typical positioning of immersive monitors, there may often be frequency response differences of 18dB or more between devices of the same type. Such differences are reduced in step 3 above.

The goal is a flat direct-sound frequency response from each monitor, that can be measured objectively using a microphone at the listening position. In case the room is not well acoustically treated, it may be indicated instead to take measurements at three or four locations, 10-25 cm around the main listening position. More information can be found in [1].

After having achieved a flat in-room frequency response, it is time for a final trim, based on actual listening. For instance, monitors with highly controlled directivity, such as Genelec’s, generate more uniform and neutral reflections than monitors of less refined design. It takes a human to really compute direct sound with reflections, and frequency response trimming should be performed at a controlled listening level to reduce sensory variation. Depending on listening level, the difference in human sensitivity between 1kHz and 100 Hz can vary by 12dB or more (ISO 226). For that reason, subjective trimming should also be performed at a listening level no lower than 80dB(C), or the subject may not be able to sense the lowest frequencies at all.

Expert software like Genelec’s GLM application can assist with all steps in the list above, and by itself take care on steps 3-6. Once setup and calibration has been done, GLM doubles as the monitor controller needed in immersive production with solo and mutes, calibrated level, bass management, and switching between loudspeaker setups having up to 72 channels.


[1] A. Mäkivirta & T. Lund, “Is single microphone position enough for immersive system equalization and level calibration in production monitoring?” in proceedings of Tonmeistertagung, Cologne (2018).

Supported by

You might also like...

Data Recording: Error Handling II - Part 15

Errors are handled in real channels by a combination of techniques and it is the overall result that matters. This means that different media and channels can have completely different approaches to the problem, yet still deliver reliable data.

The Sponsors Perspective: Media Companies - Advance Your Security And Innovation Lifecycles

Hackers are always improving the level of sophistication and constantly finding new surface areas to attack – resulting in the surging volume and frequency of cyberattacks.

PTP Explained - Part 4 - Requirement’s For Virtualisation Of ST 2110 COTS Infrastructures

In the fourth and final part of this series, we wrap up with an explanation on how PTP is used to support SMPTE ST 2110 based services, we dive into timing constraints related to using COTS (Commercial Off-The-Shelf) hardware, i.e.:…

Data Recording: Error Handling - Part 14

In the data recording or transmission fields, any time a recovered bit is not the same as what was supplied to the channel, there has been an error. Different types of data have different tolerances to error. Any time the…

PTP Explained - Part 3 - Operational Supervision Of PTP Network Services

In the previous two parts of this four-part series, we covered the basic principles of PTP and explained how time transfer can be made highly reliable using both the inherent methods IEE1588 provides as well as various complementing redundancy technologies.…