Immersive Soundtrack Created for Eurovision 2018

In its 62-year history the Eurovision Song Contest (ESC) has gone through several major technological broadcast developments: colour TV, stereo and then 5.1 sound, with, most recently, the EBU (European Broadcasting Union) implementing its Flex IP-hybrid self-managed digital transmission system to distribute the event in 2016. The innovation for this year’s ESC was the first ever live production trial of MPEG-H immersive and interactive audio, which is designed to not only give viewers a more enveloping experience but also the ability to select languages, commentaries and speech-to-music/effects levels according to personal preference.

MPEG-H is a standard developed by the Moving Pictures Experts Group (MPEG) of the ISO (International Organisation for Standardisation) and the IEC (International Electrotechnical Commission). It covers media transport, HEVC (High Efficiency Video Coding) and related compression and reference tools for both audio and video. Immersive - or 3D - and interactive sound is covered by MPEG-H Part 3, which was worked on by German research institute Fraunhofer IIS Audio and Media Technologies.

This group carried out the test at the ESC, installing equipment and setting up a recording area at the Altice Arena in Lisbon. Among the Fraunhofer IIS Audio and Media Technologies team involved were research engineer Adrian Murtaza and Andreas Turnwald. They explain that the ESC was selected as the first live production trial of MPEG-H Audio because it is "one of the largest music shows in the world" and offered "a unique opportunity to crate truly immersive content".

The spatial element of the test mix was audience reaction and ambient noise of the Arena. Murtaza and Turnwald say that the music from the stage, mixed in 2.0 stereo, was kept as it was and added to the immersive elements: "The music and vocals mix is very important for each participant so our trial was focused on creating a realistic 5.0+4H [front left, centre, front right, rear left, rear right plus four height channels] audience ambience around it and offers the opportunity to experience this atmosphere at home. The arena reflections alone are sufficient in creating a natural acoustical 'upmix'."

Schematic of the MPEG-H Audio production. Click to enlarge.

Schematic of the MPEG-H Audio production. Click to enlarge.

The audience and room were recorded using four Schoeps MK8 bi-directional (figure-of-eight) capsules on an Ambient Recording A-RAY microphone support arranged in a Hamasaki Square configuration. Designed by Japanese music recording engineer and spatial sound researcher Kimio Hamasaki, the Square was intended to capture ambient/diffuse elements for a surround sound recording. It was used at the ESC for the main portion of the height signals that made up the immersive mix.

Feeds from the Hamasaki Square passed through a RME Micstasy pre-amplifier fitted into the Arena roof. The pre-amp was used to keep the analogue mic signal path as short as possible. The pre-amplified line level signals were then passed through a 300-metre NetworkSound Mamba optical fibre snake to the TV compound for live mixing.

Murtaza and Turnwald explain that because current hardware-based mixing consoles do not as yet feature immersive busses and panners, the mix was performed on a digital audio workstation (DAW). In this case it was a Nuendo 8, which was loaded with standard onboard tools. In addition to this MPEG-H Audio authoring and monitoring plug-ins were used to generate the metadata to encode the material in the immersive format. Audio distribution was through a RME MADI router, with the mix recorded on to Video Devices PIX 270i rack mount units.

A number of preset mixes was created, including Default, Dialogue Enhancement (or Dialog+) and Venue. Murtaza and Turnwald say that Default was the "official" version, intended to reflect the aims of the sound engineers in creating the best representation of the programme. The Dialog+ preset features audio objects of the presenter and commentator feeds enhanced by between 6dB to 9dB to improve speech intelligibility. Venue consists only of the international feed and the presentation, with no commentary. This was created to simulate the experience of being in the Arena, listening to the music and hosts, surrounded by the audience. The metadata for these presets was authored during the recording and mixing process and is contained in the MPEG-H Audio stream.

Screen displaying personalised audio settings during the ESC. Click to enlarge.

Screen displaying personalised audio settings during the ESC. Click to enlarge.

Other functionality was included in the Dialog+ preset, including an interactive feature allowing viewers to set their own balance in the volume levels between speech and the background audience/ambient sounds. Another option was the ability to select commentary in their own language. "Personalisation and interactivity represent two major features of MPEG-H Audio, which allow the content creator to offer completely new experiences using the same audio stream," say Murtaza and Turnwald.

Metadata is used to describe all audio 'scenes', plus the personalisation and interactivity options available as well as accessibility services and different version presets. Fraunhofer's MPEG-H authoring software, loaded into the Nuendo 8, was used during the ESC recording session, with the output fed simultaneously with the audio into the MPEG-H Audio encoder.

Also included in the metadata were loudness and Dynamic Range Control (DRC) information. DRC was designed to adapt the loudness and dynamic range of material to suit the capabilities of the device being used for playback. In the case of something with low dynamic range, such as the loudspeakers on a tablet, the output will be compressed accordingly. "The zero point of this compression curve is adjusted according to the programme loudness, which is always being measured and transmitted in the MPEG-H Audio stream," explain Murtaza and Turnwald.

MPEG-H Audio can be listened to in any of a variety of formats, including stereo and 5.1 as well as 5.1+4H and binaural. Murtaza and Turnwald comment that MPEG-H supports binauralisation capability so full 3D sound reproduction can be achieved on headphones from any portable playback device.

The MPEG-H Audio immersive mix of the 2018 Eurovision Song Contest will be used at EBU technology demonstrations "in the near future", featuring the soundtrack played through a specially equipped soundbar.

Editor note:

A companion article, "Extensive Networking, Wireless And Comms Required For Eurovision,"provides an exhaustive look at the audio networking and technology required to produce the world's most-watched audio contest. Click the link above to read the article.

You might also like...

Audio Levels - Part 1

Superficially, level seems to be a simple subject: just a reading on a meter. In practice, there’s a lot more to it. Level matters because if it is wrong, sound quality can suffer, things can get damaged or cause…

The Sponsors Perspective: How 5GHz Boosts Digital Wireless Intercom In Broadcast Applications

Development of new technology and moving to the newly available 5GHz spectrum continue to expand the creative and technical possibilities for audio across live performance and broadcast productions.

Check Your DAW’s Audio Plugins, They Are Probably Better Than You Think

Every digital audio workstation — even the free ones — comes with a set of plugins for processing audio. Most us forget about them, concluding that to get quality audio processing we need to spend big money for name-brand plugins endorsed by wel…

Essential Guide: Improving Comms With 5GHz

As broadcasters continue to differentiate themselves through live programing and events, intercom is gaining more influence now than ever. This is especially true for large arena events where mobile crews demand the freedom of wireless connectivity. But as RF technology…

How Latency Affects Computer Audio Editing Systems

Latency is the millisecond delay between the time an audio signal enters an electronic system and when it emerges. Though a simple concept to understand, latency can play havoc in a range of audio applications, including well-known issues in live…