Felix Krückels is a certified audio engineer who graduated from the Detmold University of Music and has been involved in immersive audio since 2012. He was there when NHK launched its Super Hi-Vision project with the help of Lawo.
In 2018, he became professor for Broadcast Production and System Design at the University of Applied Sciences in Darmstadt (Germany) where he is conducting research into new dimensions of immersive audio. A major highlight of his career was as A1 at the soccer world cup 2018 in Russia, which was produced in Dolby Atmos.
Asked whether mixing immersive audio is more demanding than working in 5.1 or stereo Krückels states that the most important challenges come with the ‘bonus features’ afforded by immersive audio broadcasts. Thanks to object-based audio, consumers can individualize the streams they receive changing ambience and commentator levels, and so on, so the engineer always has to bear that in mind.
On the other hand, he notes that with Next Generation Audio (NGA) object-based production, it’s no longer necessary to focus on several discrete listening scenarios: The engineer creates the 3D space from objects and the end user is at liberty to decide what to listen to. Whether it’s binaural headphones, two speakers, ten speakers, or a sound bar, the consumer decoder mixes in tandem with the A1.
Of course, an audio engineer always needs to check typical reference scenarios and for Krückels those are typically 5.1.4, 5.1, stereo, and binaural, which should be checked with different NGA ‘presentations’ afforded by the interactive side of NGA.
A good example of a useful presentation for sports is a creation by the Dolby team and Felix Krückels called the ‘Pub Presentation’. When a football match is shown at a public bar, the actual crowd at the sports event becomes barely audible because the cheers and boos of the audience in the bar are the dominant crowd noise. Therefore, field-of-play audio details, like ball kicks, tackles, whistle and moaning noises, need to be more prominent.
“The ultimate goal for an A1 in TV live coverage,” says Krückels, “Is to exaggerate all relevant noises to such an extent that viewers at home have the illusion of being at the venue where the soccer game, boxing fight, etc., takes place.”
The interactive features of 3D/immersive audio can be a blessing in disguise: Any added flexibility afforded to the consumer might attract tweaks and changes to the point where the audio content and the experience is ruined. This explains why sound supervisors usually favor a slightly conservative approach, with relatively few bells and whistles. They know that they are unable to control what viewers at home do with the presentations, and so limit the options.
The actual audio production as such, is relatively straightforward and very similar to 5.1 production. The added height dimension, from an operator’s point of view, can be managed quite easily. The big new considerations include how the additional options are presented to end consumers, how to monitor the presentations, and which individualization options to make available to the general public in the first place.
So, how does one approach a 3D/immersive audio mix for sports broadcasting?
The first step, says Krückels, is to look for a venue’s ‘sweet spot’: The position where one can hear everything. That is where a 3D microphone should be mounted. It turns out that this position is usually located close to camera 1.
The 3D microphone is suspended from the roof, at a suitable distance from the crowd to avoid too much interference from drums, vuvuzelas, offensive language, and so on. The 3D microphone thus serves the same purpose as a suspended microphone used to capture the overall sound of a symphonic orchestra.
For reasons of intelligibility and flexibility, spot microphones are positioned close to all important sound sources. The resulting signals are combined in such a way that they make acoustic sense coming out of nine speakers.
Felix Krückels likes to work with three ‘planes’ for his mixes: a mono, a stereo and a surround/3D plane, created from several aspects, or source types.
For him, the surround/3D information usually only concerns the ambience (crowd, city noises, etc.). He hardly ever uses novelty effects or dramatic pans, though some broadcasters use the occasional ‘whoosh’ to announce slo-mo instant replays and the like. To this he adds - usually in mono –typical field-of-play noises - the signals captured by or directly associated with microphones close to the cameras. And finally, there’s the narrator or commentator. Krückels takes great care to maintain the separation among these three aspects to leave sufficient room for artistic license and alternatives.
He considers it important to have a stationary position for the ambience mix, even though visual switching among cameras might suggest otherwise. Applying audio-follows-video to the ambience signal, he says, would quickly lead to listening fatigue and discomfort, because a rapid succession of audio perspectives triggers innate human reflexes of insecurity.
If the cameras do a proper job, the viewer will easily realize that the action is on one side of the field even though the audio information seems to suggest otherwise. This also explains why ball kick noises are always at the center (mono), irrespective of whether they occur on the left or right side of the field.
This might sound like a big compromise, but it is a good one, says Krückels - especially since the spill of background noises into the field-of-play noises is such that moving the those between left and right would cause serious imbalances to the ambience.
Are there different philosophies regarding how surround/immersive audio should be mixed? There do seem to be European and American preferences: Europeans pay more attention to a convincing crowd sound, while American productions often favor an ‘in your face’ aesthetic - placing players at the center of the audio image and using heavy compression ratios for ‘ultra-realism’ – complete with the audible artefacts of that compression.
Krückels himself subscribes to a compromise between these two—paying attention to details (FOP noises) while maintaining a truly immersive ambience where viewers have the impression of being at the venue. He does not want to be immersive at all costs, though. A silent crowd, he says, does not sound more animated when captured in 3D/immersive audio. For Mexican waves, on the other hand, he likes it when the sound travels from speaker to speaker.
Dynamics effects are extremely important in a surround/immersive audio scenario. In the broadcast world, no audio engineer can afford to do without them, says Krückels. A dynamic range of 30 to 40dB simply would not work. For key signals such as speech, music, and FOP noises, the human ear prefers to stay in a +7 to –10dB LUFS range. This is even more important when you consider that the level is usually much higher in an arena (110dBA, for instance) than the rendition consumed at home (maybe around 68dBA). That doesn’t leave much headroom for the audio engineer anyway. Add to that that most people’s preference to stay within a dynamic range of around 15dB, and the necessity of dynamics processing becomes obvious.
One solution that puts this principle to clever use, says Krückels, is Lawo’s KICK software. According to him, it is the only solution that manages to keep the kick and spill noises at a constant level, thus avoiding sometimes brutal level jumps and artefacts.
Will it Take Off?
Most sound engineers are confident that 3D/immersive audio will establish itself much faster than 5.1, not least thanks to important side technologies like Virtual and Augmented Reality. Many people are already familiar with binaural listening and even head-tracking in VR, which is available on most gaming consoles.
Audio engineers can easily create binaural mixes that serve as immersive sound renditions—and most people will be hooked almost instantly and never want to return to a stereo mix. Krückels therefore believes that headphones will play an important part in establishing immersive audio.
However, he is not so fond of sound bars, which he considers a compromise that might just fall short of accurately rendering the added value.
In any case, the consumers’ ready acceptance of all these technologies, are playing a major part in the success of immersive audio broadcasts – particularly in sports - and with lots of standards development and interactive functionality to come, it looks like Next Gen Audio is headed for success. It’s important that every A1 is well prepared for it.
You might also like...
It’s interesting to compare the quality that can be obtained using digital audio with legacy media such as the vinyl disk and magnetic tape.
With the advent of immersive audio mixing using codecs like Dolby Atmos and DTS:X (the successor to DTS HD) professionals now have the ability to create interactive, personalized, scalable and immersive content by representing it as a set of…
Noise shaping performs an important role in digital audio because it allows hardware to be made at lower cost without sacrificing performance, and in some cases allowing a performance improvement.
Oversampling is a topic that is central to digital audio and has almost become universal, but what does it mean?
Strategies for capturing immersive audio for scene and object-based audio.