MPEG-H immersive audio will put viewers in the middle of audiences like the one at the 2018 Eurovision Song Contest
In its 62-year history the Eurovision Song Contest (ESC) has gone through several major technological broadcast developments: colour TV, stereo and then 5.1 sound, with, most recently, the EBU (European Broadcasting Union) implementing its Flex IP-hybrid self-managed digital transmission system to distribute the event in 2016. The innovation for this year’s ESC was the first ever live production trial of MPEG-H immersive and interactive audio, which is designed to not only give viewers a more enveloping experience but also the ability to select languages, commentaries and speech-to-music/effects levels according to personal preference.
MPEG-H is a standard developed by the Moving Pictures Experts Group (MPEG) of the ISO (International Organisation for Standardisation) and the IEC (International Electrotechnical Commission). It covers media transport, HEVC (High Efficiency Video Coding) and related compression and reference tools for both audio and video. Immersive - or 3D - and interactive sound is covered by MPEG-H Part 3, which was worked on by German research institute Fraunhofer IIS Audio and Media Technologies.
This group carried out the test at the ESC, installing equipment and setting up a recording area at the Altice Arena in Lisbon. Among the Fraunhofer IIS Audio and Media Technologies team involved were research engineer Adrian Murtaza and Andreas Turnwald. They explain that the ESC was selected as the first live production trial of MPEG-H Audio because it is "one of the largest music shows in the world" and offered "a unique opportunity to crate truly immersive content".
The spatial element of the test mix was audience reaction and ambient noise of the Arena. Murtaza and Turnwald say that the music from the stage, mixed in 2.0 stereo, was kept as it was and added to the immersive elements: "The music and vocals mix is very important for each participant so our trial was focused on creating a realistic 5.0+4H [front left, centre, front right, rear left, rear right plus four height channels] audience ambience around it and offers the opportunity to experience this atmosphere at home. The arena reflections alone are sufficient in creating a natural acoustical 'upmix'."
The audience and room were recorded using four Schoeps MK8 bi-directional (figure-of-eight) capsules on an Ambient Recording A-RAY microphone support arranged in a Hamasaki Square configuration. Designed by Japanese music recording engineer and spatial sound researcher Kimio Hamasaki, the Square was intended to capture ambient/diffuse elements for a surround sound recording. It was used at the ESC for the main portion of the height signals that made up the immersive mix.
Feeds from the Hamasaki Square passed through a RME Micstasy pre-amplifier fitted into the Arena roof. The pre-amp was used to keep the analogue mic signal path as short as possible. The pre-amplified line level signals were then passed through a 300-metre NetworkSound Mamba optical fibre snake to the TV compound for live mixing.
Murtaza and Turnwald explain that because current hardware-based mixing consoles do not as yet feature immersive busses and panners, the mix was performed on a digital audio workstation (DAW). In this case it was a Nuendo 8, which was loaded with standard onboard tools. In addition to this MPEG-H Audio authoring and monitoring plug-ins were used to generate the metadata to encode the material in the immersive format. Audio distribution was through a RME MADI router, with the mix recorded on to Video Devices PIX 270i rack mount units.
A number of preset mixes was created, including Default, Dialogue Enhancement (or Dialog+) and Venue. Murtaza and Turnwald say that Default was the "official" version, intended to reflect the aims of the sound engineers in creating the best representation of the programme. The Dialog+ preset features audio objects of the presenter and commentator feeds enhanced by between 6dB to 9dB to improve speech intelligibility. Venue consists only of the international feed and the presentation, with no commentary. This was created to simulate the experience of being in the Arena, listening to the music and hosts, surrounded by the audience. The metadata for these presets was authored during the recording and mixing process and is contained in the MPEG-H Audio stream.
Other functionality was included in the Dialog+ preset, including an interactive feature allowing viewers to set their own balance in the volume levels between speech and the background audience/ambient sounds. Another option was the ability to select commentary in their own language. "Personalisation and interactivity represent two major features of MPEG-H Audio, which allow the content creator to offer completely new experiences using the same audio stream," say Murtaza and Turnwald.
Metadata is used to describe all audio 'scenes', plus the personalisation and interactivity options available as well as accessibility services and different version presets. Fraunhofer's MPEG-H authoring software, loaded into the Nuendo 8, was used during the ESC recording session, with the output fed simultaneously with the audio into the MPEG-H Audio encoder.
Also included in the metadata were loudness and Dynamic Range Control (DRC) information. DRC was designed to adapt the loudness and dynamic range of material to suit the capabilities of the device being used for playback. In the case of something with low dynamic range, such as the loudspeakers on a tablet, the output will be compressed accordingly. "The zero point of this compression curve is adjusted according to the programme loudness, which is always being measured and transmitted in the MPEG-H Audio stream," explain Murtaza and Turnwald.
MPEG-H Audio can be listened to in any of a variety of formats, including stereo and 5.1 as well as 5.1+4H and binaural. Murtaza and Turnwald comment that MPEG-H supports binauralisation capability so full 3D sound reproduction can be achieved on headphones from any portable playback device.
The MPEG-H Audio immersive mix of the 2018 Eurovision Song Contest will be used at EBU technology demonstrations "in the near future", featuring the soundtrack played through a specially equipped soundbar.
A companion article, "Extensive Networking, Wireless And Comms Required For Eurovision,"provides an exhaustive look at the audio networking and technology required to produce the world's most-watched audio contest. Click the link above to read the article.
You might also like...
OTT delivery continues to expand to meet the relentless growing consumer demand. This trend shows no chance of abating and technologists are continually looking to innovation to scale infrastructures accordingly. But what does it mean to scale OTT? Where is…
The media industry is evolving faster than at any point in its history. Broadcasters and content producers are striving to meet consumers’ insatiable appetite for more content, rich viewing experiences, stunning images and access across all screens. As a result, i…
In part-1 of this three-part series we discussed the benefits of Remote Production and some of the advantages it provides over traditional outside broadcasts. In this part, we look at the core infrastructure and uncover the technology behind this revolution.
Recent international events have overtaken normality causing us to take an even closer look at how we make television. Physical isolation is greatly accelerating our interest in Remote Production, REMI and At-Home working, and this is more important now than…
Superficially, level seems to be a simple subject: just a reading on a meter. In practice, there’s a lot more to it. Level matters because if it is wrong, sound quality can suffer, things can get damaged or cause…