The main 5.1 audio mix room in ARET’s new OB trailer. Image: Genelec.
The Human Auditory System evolved as a survival tool and one of the vital functions of hearing is to establish where a source of sound is located. The oldest aspects of human hearing, from an evolutionary standpoint, are those concerned with direction. As we determine direction in everyday sounds, it is not unreasonable to think that direction information in reproduced sound is important to realism.
Stereophony is an attempt to impart directional information into reproduced sound. It requires two loudspeakers driven by two audio signals and can in principle simultaneously create an unlimited number of virtual sound sources located anywhere between the speakers. Given that the two audio signals are creating a sound image, a stereo pair actually contains more information than two monophonic signals. Each signal in a stereo pair needs to be reproduced more precisely than a monophonic signal else the image information may be impaired.
That goes for every link in the chain, including the loudspeakers. The use of stereophony puts additional constraints on the design of the loudspeaker and tightens some existing ones. Loudspeakers that sound like loudspeakers instead of the original sound can fail in mono, but the failure will be more spectacular in stereo because their limited information capacity does more harm.
It is not widely appreciated that there are actually two stereophonic sound formats that are not compatible. The first of these is shown in Figure 1, where attempts are made to capture the sound in the same way that the HAS does. A dummy head may be fitted with a pair of microphones acting much as ears do, hence this signal format is called “binaural”. If the signals from the microphones are enjoyed using headphones, the result is very satisfactory. Effectively the listener’s ears have simply been moved to the microphone location, so we expect that. Whatever differences exist between the sounds at the two microphones; they are conveyed to the listener.
Figure 1. When a dummy head is used, the resulting binaural signal must be auditioned on headphones. The microphones and headphones have moved the listener’s ears to the original sound fields. Binaural signals do not work on loudspeakers and mono compatibility is poor.
However, if the binaural signals from a dummy head are reproduced via a pair of loudspeakers, the result will be unsatisfactory. It should be clear that when listening via headphones, each ear is presented with one channel of the sound. When listening to a pair of loudspeakers, both ears are able to hear both speakers. This requires a different type of arrangement in which the two loudspeakers receive signals that are identical except for their relative level. It is the relative level that gives the illusion of position to the listener.
Such a pair of signals can be created from a mono signal using a panoramic potentiometer, or pan-pot. The position of the virtual source follows the position of the panning knob. Each source in an image then requires its own pan-pot. Of course multiple sources can be captured simultaneously using suitable microphones techniques that will be discussed.
When coincident or pan-potted signals intended for loudspeakers are reproduced using headphones, the result is equally unsatisfactory. The sound appears inside the listener’s head. Nevertheless that is exactly what happens with most of today’s audio equipment, even equipment that is specifically designed for headphone listening. To convert between binaural and co-incident stereo signals requires a standards conversion process in a device called a shuffler. There are many such shuffling processes but only one name.
The majority of commercially available recordings are intended to be heard on loudspeakers and this piece will concentrate on that. Figure 2 shows that in loudspeaker stereo, the listener and the two speakers form the corners of a roughly equilateral triangle. Each of the listener’s ears receive two signals, the direct signal from the speaker on the same side, and a delayed signal from the speaker on the other side. Clearly if only one speaker emitted sound, the HAS could locate it from the delay between the sound reaching the two ears.
Figure 2. Stereophonic listening requires the listener and the speakers to approximate an equilateral triangle.
However, if both loudspeakers emit the same sound, the system is completely symmetrical and the signals received by both ears must be the same. The HAS will conclude the sound has come from half-way between the speakers. This forms the basis of an important quality test. The system is set to mono such that the two speakers must receive identical signals. If sharp central image half-way between the speakers is not obtained, there must be something wrong.
Figure 3 shows that as the sound at each ear is the sum of a direct and a delayed version of the same sound, the effect of changing the proportions of direct and delayed sound is to change the apparent delay, and so the apparent position of the virtual sound source between the speakers. That is how a pan-pot works: the same signal is fed to both speakers in varying proportion. The pair of signals from a pan-pot can vary in amplitude only, there is no variation in timing or phase.
Figure 3. The loudspeaker stereo illusion relies on both ears hearing both speakers. It won’t work on headphones. Geometry produces delays between the sounds reaching each ear. The relative level of the sound from each speaker serves to control the delay, which the HAS interprets to locate the sound. The sounds from the two speakers differ only in amplitude, which is why a pan-pot can create them.
Many successful pop recordings are made by panning various mono tracks to different places in the image and then adding artificial reverberation. However, for some purposes an existing sound image can be captured by microphones. Although there are many microphone techniques, there is only one that maps the existing sound image onto the virtual image by creating just the right level differences as a function of direction. In order to do this, the microphones must be:
a) coincident, so that any phase differences between the channels are minimised, b) directional, so that the level is a function of direction, c) pointed in different directions so that different amplitudes are obtained and d) identical so that any differences in amplitude result only from direction changes.
Figure 4 shows a pair of crossed figure-of-eight microphones. A sound source at the front will be equally off-axis to both and generate equal signals. As the sound source moves to one side, it will be further off the axis of one microphone and nearer to the axis of the other. The necessary level differences will be created.
Figure 4. An example of a microphone configuration that can create accurate sound images. As a sound source moves to one side, it becomes closer to the axis of one microphone and further from the axis of the other, creating a level difference just as a pan-pot does. Unlike a pan-pot, the microphone can handle an indefinite number of sound sources simultaneously.
The audio vectorscope is an extremely useful visualisation tool. In its original form, it was a cathode ray tube type oscilloscope set to X-Y operation. The Left, Right stereo signals were added to produce a mid or mono signal that operated the vertical deflection of the beam and the signals were subtracted to drive the horizontal deflection. The audio vectorscope display draws a line on the screen that points to each sound source. Displayed vectors should follow the position of a pan-pot or the position of real sounds before a coincident microphone.
The fundamental work on stereophonic sound was done by Alan Blumlein, who was granted extensive patents on the subject. The large numbers of alternative stereo microphone techniques that have since emerged all have two things in common. The first is that they didn’t violate Blumlein’s patents. It follows immediately that they aren’t capable of mapping a real sound image onto a virtual image, accurately or in some cases at all.
Signals from these royalty-free microphone techniques contain an element of spatiality rather than an acoustic image, best described as enhanced mono, because the solidity isn’t there. The result on an audio vectorscope is a display like a bowl of spaghetti. Finding something resembling a vector isn’t going to happen. Finding a vector that moves in sympathy with a real sound source doesn’t happen either. These problems are easily overcome by explaining that audio vectorscopes are no good.
The adoption of such poor imaging techniques was aided by the fact that the criteria for accurate imaging in loudspeakers were not understood. Auditioned through poor-imaging speakers, royalty-free microphone techniques didn’t sound much worse.
It should be clear from Figures 2 and 3 that the sharpness of the sound image, the equivalent of resolution in optics, is dependent on the effective size of the loudspeaker. The greatest resolution is obtained when the loudspeaker acts like a point source. If the speaker is non-ideal, it may act like a distributed source or the source may move with frequency. A lot of legacy speakers are made of flat pieces of wood and have sharp corners. Diffraction from the sharp corners of these monkey coffins means that the point source is smeared out to the width of the enclosure.
As the position of the virtual source depends on relative amplitude, it follows that the two loudspeakers in a stereo pair need to have exactly the same frequency response; otherwise there will be frequency dependent smear, where the image pulls towards the speaker that responds more.
Loudspeakers that are highly directional need both to be pointed toward the listener; who must remain in a so-called sweet spot where the frequency response of both speakers is the same. In such speakers the frequency response off-axis may vary wildly and the chances of been equally off-axis to both speakers is slim. Generally, the smaller the sweet spot; the worse the speakers are.
The universal solution to legacy speakers having poor off-axis response is to apply extensive sound absorbing to soak up the off-axis rubbish. The result is a dead listening experience that is nothing like what the consumer will hear. It’s hardly surprising under those conditions that people revert to using omni-directional microphones to get some life back into their audio. The life comes back but the image goes. Mediocre loudspeakers have a lot to answer for, but not as much as the people who go on using them without realising how they bias every decision.
Most loudspeakers have multiple drive units. Some designs minimise the effective size of the speaker using co-axially mounted drivers. To keep the horizontal source size small the drive units in non-coaxial designs are often mounted in a vertical line. Yet how many photographs do we see in audio magazines where such loudspeakers have been laid on their sides? I don’t know which is the most horrifying; that the people who do it don’t know that they shouldn’t or that they can’t hear the results.
Editor note:John Watkinson's entire series on loudspeaker technology can be located from The Broadcast Bridge home page. Search for "John Watkinson". His other articles also will be listed.
Mr. Watkinson is author of more than 20 books on audio and video technology and television transmission systems with a recent book on helicopters. His works are available from major booksellers.
You might also like...
It’s interesting to compare the quality that can be obtained using digital audio with legacy media such as the vinyl disk and magnetic tape.
Noise shaping performs an important role in digital audio because it allows hardware to be made at lower cost without sacrificing performance, and in some cases allowing a performance improvement.
Oversampling is a topic that is central to digital audio and has almost become universal, but what does it mean?
Strategies for capturing immersive audio for scene and object-based audio.
Genelec Senior Technologist Thomas Lund starts down the road to ideal monitoring for immersive audio by looking at what is real, and how that could or should be translated for the listener.