Genelec Senior Technologist Thomas Lund starts down the road to ideal monitoring for immersive audio by looking at what is real, and how that could or should be translated for the listener.
Natural listening is immersive. Whether in a room or outdoors, sound is all around us. Just two months old, we automatically recognize the direction of a sound, turning eyes towards a source, and half a year later we start using movement to change the perspective as an integral part of hearing and seeing when interpreting the world.
Actively reaching out using physical movement is a main element of human sensing, where the ground rules are laid before the age of two. Early in life we also already make use of individual and unique outer ears, to understand, for instance, what is direct sound and what is reflected sound. Like the imprint of a mother tongue, early sensory experience becomes a reference we’re unable to ever fully escape, naturally rooted in our anatomy and the conditions found on planet Earth around the time of our childhood.
Such fundamentals are important to keep in mind when discussing multichannel delivery and reproduction, from personal binaural to immersive in-room systems. The essential question to ask is how well a given system is able to satisfy those basic human rules of engagement.
Early surround sound formats such as 4.2.4 and 5.1, had a limited ability to influence playback localisation and envelopment, while NHK’s multilayer 22.2 sound system now incorporates both horizontal azimuth and elevation for a potentially much more natural listener experience.
Before we get to the practicalities of immersive monitoring in part 4 of this series, let me provide an overview from multichannel audio research over the past decades.
Genelec 8331 SAM loudspeaker.
From Surround To Immersive
When developing one of the first multichannel processors, TC System 6000, the limitations of 5.1 quickly became obvious, even with just one person seated at the ideal location. I was working at TC Electronic at the time, and we went on to develop a host of 5.1 and 7.1 algorithms, but magic didn’t really start to happen until channels with elevation were added.
By 2010, the company studio had been retrofitted for 26 channel reproduction using active monitors, but I still remember the agony of adapting them to the room. Not only was it physically impossible to get the same distance to all monitors, each and every one sounded different, though they were the same type. An engaged engineer actually broke his arm climbing around to adjust switches on the back of those monitors to get their in-situ frequency response somewhat similar.
The shift from surround to more compelling immersive formats is reflected in ITU-R standards BS.775 and BS.2159. The two documents also describe the basic requirement that channels should all sound the same and be time- and level-aligned. However, they largely fail to explain the consequences of such requirements, leading many to believe monitors just need to be of the same type.
ATSC A/85 and EBU R128 provide more useful and practical guidance: “In-situ measurements of loudspeakers in control rooms, however, show strong deviations from the anechoic response of the loudspeakers, in particular due to room boundary loading conditions at very low frequencies, with standing wave modal effects through the range typically from about 80 Hz to 500 Hz.
For this reason, room equalization is highly desirable to the point of necessity for higher quality spaces.” In other words, each monitor must be frequency response corrected after its placement in the room, or you can’t trust what you hear.
A layout for the surround sound component of Super Hi-Vision, developed by NHK Science & Technical Research Laboratories.
New Immersive Studios
Immersive production is no longer reserved for theatrical content, but spreading into ambitious broadcast like NHK’s 22.2 format, OTT drama, enveloping music, and gaming. In any case, the listener will likely not be seated with 300 others, and a more personal reproduction may be assumed.
Production optimised for such scenarios should provide the professional with a more accurate and dedicated sweet spot, including him or her to making use of active sensing with head movements, thereby promoting content credibility and engagement. Delivery specifications also recommend monitoring at lower levels than for theatrical work, at 75 - 79 dB SPL, to ensure speech intelligibility and to reduce overall sound exposure.
New production requirements in turn mean new production possibilities, with less reliance on washed-out monitoring and more on conveying credible spatial contrasts and directional detail. Recently, precision monitors have become available that include compensation for placement and can be used at close range - for instance, the Genelec 8331.
Based on such technology, excellent immersive productions in 7.1.4 or 22.2 format may suddenly be realised in small rooms, between sidewalls 2m or even less apart.
Monitoring And Playback Using Headphones
The purpose of monitoring is to evaluate content in a neutral way, and to ensure good translation to other reproduction conditions. On-ear and in-ear headphones have not been ideal for this purpose, even just considering stereo production, because they exclude the influential external ear and movements from the equation. Headphones thereby break the link to natural listening described in the introduction.
Using generic headphones, important sources like human voice or tonal instruments are difficult to level, pan and equalise because mid-range frequencies translate randomly between people when using them. What you hear can be quite different from what the other person hears, even if you are passing the same set of headphones around.
However, more natural headphone playback is bound to become readily available soon, with several big consumer companies investing in egocentric “i”-solutions. The personalisation required for a landslide to happen has been demonstrated on the pro side, for instance with Genelec’s Aural ID method, so it’s only a question of time before large-scale personalisation for credible “iconsumption” becomes reality. Nevertheless, for a convincing experience there is no escaping the human rules of engagement: Consumer devices not only need to be statically personalised for direct sound from various numbers of directions, they also need to render in-room reflections personally; and to do it all coherently with head movements and low latency; like we experience in real rooms and everywhere else.
Though practical reproduction methods are lacking still, consumers will therefore shortly be able to enjoy immersive content better and more easily. Considering the production side, a standardised in-room monitoring system will often remain the option of choice because it translates perfectly to soundbars and other loudspeakers, as well as to ideally personalised headphone consumption.
You might also like...
We move on to looking at developments in noise cancelling technology and the role it can play in achieving clarity and comfort within headsets for intercom use.
This is the second instalment of our deep dive into the rapid growth of OTT, high user expectations and the developments in hybrid systems which combine CDN with storage and distributed processing to meet demand.
In the beginning, there was television. And whenever people tried to make television programmes effective video signal monitoring was an essential pre-requisite.
In real systems the issue of sampling rate conversion arises frequently but fortunately there are plenty of solutions.
Broadcasting video and audio has rapidly developed from the send-and-forget type transmission to the full duplex OTT and VOD models in recent years. The inherent bi-directional capabilities of IP networks have provided viewers with a whole load of new interactive…