Loudspeaker Technology Part 3: The Frequency Domain and Human Hearing

The Bang & Olufsen BeoLab 90 speaker celebrates the company’s 90th anniversary. Retail price for these innovative speakers, about $90,000.

In Part 2, of John Watkinson’s series of articles on loudspeakers, the critical time-domain operation of human hearing was considered. In Part 3, he explains how the frequency domain interacts with the time domain and why they are a crucial concern in any accurate loudspeaker design.

The bit rate of a CD is about 1.5Mbs. The human nervous system simply isn’t capable of that sort of data rate, or anything like it, so one of the jobs of the physical ear is to perform some prior analysis of incoming sounds before nerve impulses are created. The Human Visual System must do something similar with images, of course. That topic was considered in my article, “How we see.”

The basic transduction method of the human ear is that tiny hairs are deflected by the flow of fluid and the deflection is sensed by nerves. To increase sensitivity, some of the hairs are active: they amplify the fluid movement by moving in sympathy.

A transducer filled with fluid is not an obvious solution for a land-dwelling being, and may indicate that life began in water. A technical problem for a fluid-filled transducer in air is the mechanical impedance mismatch between sound travelling in air and sound travelling in fluid. Most of the sound energy in the air would simply reflect from that mismatch.

Instead, sound arrives at, and vibrates, the eardrum, whose motion is geared down by a system of tiny bones, or ossicles, that act as an impedance convertor. Small forces and high velocities at the eardrum are converted to higher forces and lower velocities at the output, which is a piston-like bone that excites the fluid-filled mechanism of the inner ear.

Figure 1. The ossicles are three bones in either middle ear that are among the smallest bones in the human body. They serve to transmit sounds from the air to the fluid-filled labyrinth.

If someone were to propose a microphone design working on that basis, everyone would die laughing because of the obvious shortcomings. The truth of the matter is that in some respects the HAS is not very good. The story put about by hi-fi enthusiasts that the ear is some miraculous device that can hear problems that no instrumentation can detect is a huge joke. It does, however, justify the sale of products (generically known as snake oil) that claim to produce an improvement that no instrumentation can detect. The improvement to the vendor’s bank balance is beyond dispute.

The inner ear is a small tube hollowed out of the skull having a flexible diaphragm dividing it lengthways. This is known as the basilar membrane. The membrane and the surrounding fluid together create a mechanism that can respond to transient and stationary sounds. (Here, stationary is used in the statistical sense that the spectrum is not time variant).

The membrane and attached fluid has mass and associated compliance and damping. It is capable of both transmission line behavior and resonant behavior, but at different times.

A transient sound will be supplied as a time-domain fluid pressure waveform to the outer end of the Basilar membrane. As the disturbance travels along the membrane at finite speed, nerve cells trigger in different places at different times. Thus a very sharp transient, having maximal bandwidth, can be handled by nerves having a low firing rate because the transmission line spreads the event out in time. When the HAS seeks to correlate two transient waveforms for location purposes or to identify a reflection, what it is actually doing is looking for a pair of similar patterns of nerve firings, which is a lot easier for a low-speed biological process.

Figure 2. Human ear basilar membrane. Image: Kern A, Heid C, Steeb W-H, Stoop N, Stoop R, Biophysical Parameters Modification Could Overcome Essential Hearing Gaps.

Only after the sound source has been located, and its size estimated, will the HAS transfer over to operate the more evolutionarily recent frequency domain analysis mechanism.

The basilar membrane is far from uniform. Near the middle ear it is light and stiff, further away it becomes gradually heavier and looser, so that it has a range of resonant frequencies along its length, from 20kHz near the middle ear (in the young) to 20Hz at the pointed end.

There is simply no evidence of any adult HAS response to sounds above 20kHz, and clearly engineering audio systems with a response much above that makes no sense. On the other hand there is no law against people buying hopelessly over-specified products on the basis of unsubstantiated beliefs.

When acting as a frequency analyzer, the basilar membrane only provides amplitude information for each frequency it detects. There is a well known demonstration in which some stationary waveform is synthesised and whilst listening the phase relationship between the frequency components is varied (this is linear distortion: it changes the waveform but not the harmonic content) and no-one listening is any the wiser.

From this test many people conclude that the ear is phase-deaf at all times and that the time response of loudspeakers doesn’t matter. That conclusion is totally erroneous. Whilst the ear may be phase-deaf on stationary sounds like tones, as we have seen these convey little information. More importantly, when the ear is working in the time domain it is highly sensitive to linear distortion and if this is too great it will impair the ability of the HAS to process time-domain information.

Figure 3. Relationship between time and frequency domains.

It is easy to see that if a loudspeaker has time constants of its own, that will impair the ability of the HAS to estimate the size of sound sources using time constants in the audio signal. This may also explain why certain designs of loudspeaker appear to work better (or at least show fewer deficiencies) on certain types of music. If one considers musical genres in which all of the instruments are electric or electronic, the signals concerned will contain no information about the size of an acoustic sound source because there is no such source. It follows that a loudspeaker that superimposes time constants of its own will do no damage to such recordings.

There is no shortage of speakers that sound great on rock music yet are incapable of reproducing female speech with any realism. The unfortunate lady sounds like she is inside a tea chest. Smaller speakers are considered better for speech.

The corollary, of course, is that an accurate loudspeaker that does not superimpose its own views on what the sound waveform should look like can be used for all types of sound. Equally, all accurate loudspeakers sound surprisingly similar.

Required speaker performance

It may be that we have come far enough through the working of the human ear to attempt some sort of a specification for a realistic or accurate loudspeaker. An adequate frequency response is obvious, as is freedom from harmonic distortion on stationary signals, so I won’t dwell on that. However, if we believe all that stuff about how the HAS works in the time domain, and we should, it immediately follows that linear distortion is not acceptable in a loudspeaker. In other words all frequencies should take the same time to pass through the speaker, such that the input waveform is preserved.

One of our criteria has to be that the loudspeaker must be able to reproduce a (band-limited) square wave, because that is the simplest test we have for linear distortion. However painful it may be to break with tradition, that is a fundamental requirement and anything that prevents it has to be abandoned and an alternative found.

Because all loudspeakers radiate into a more or less reverberant environment, it is vital that they should radiate more like real sound sources do. This means that it is no longer acceptable that a loudspeaker only meets some performance criterion on axis whilst ignoring what happens off axis. It is perhaps pertinent to ask why loudspeakers are deemed to have an axis when people, instruments and natural sound sources don’t.

Perhaps the concept of an axis is undesirable in loudspeakers. This implies that the sound quality radiated in any direction should be as good as in any other direction.

Figure 4. The three important domains in which a realistic loudspeaker must meet performance criteria; Time, Space and Frequency. Neglect of any one will nullify excellence in the other two. Legacy speakers concentrate on the frequency domain and so they always sound like loudspeakers.

Figure 4 shows the three key domains in which a loudspeaker must meet performance criteria. These are time, space and frequency. The time domain sets criteria for linear distortion, the space domain sets criteria for directivity and the frequency domain needs no comment. Typically, legacy speakers address the frequency domain only and the fact that such speakers all sound different is because they fail to address the other two domains in different degrees.

In summary, a flat frequency response is needed so that the timbre or tonality of the original sound is unimpaired. Lack of linear distortion allows transient sounds such as percussion properly to be reproduced and allows the ear to determine the size of sources. Good directivity means that the quality of reverberant sound is sufficiently close to the direct sound that the ear can recognise reflections for what they are and the Haas effect can operate.

In Figure 5, the a) graph shows measured speaker performance across a set of tests. The results are uniform. In graph b), test C results are lower than other results. This means that any manufacturing costs expended in obtaining the high values in tests B and D are squandered.

In any product design, it is the weaknesses that irritate the user and cause resentment. Thus for any product cost, the best performance/price ratio will be where the product performs equally well across the range of tests. In other words do nothing badly. This can be seen in Figure 5, graph a). In Figure 5, graph b) the product performance is dragged down by results C, which means the money spent on obtaining the high B and D performance is wasted.

No realism will be obtained until all three domains are addressed. The first serious attempt at addressing the time and directivity domains was the Quad ELS-63 designed over half a century ago. It still gives a good account of itself.

The best value will be obtained when failings in the three domains are balanced. Figure 5, The graph a) portion illustrates the hallmark of good industrial design, where all relevant factors perform about the same so the product is not let down by a weakness. This is almost anathema to the world of hi-fi, in which vast sums are spent on minutiae and glaring deficiencies are totally neglected.

John Watkinson Consultant, publisher, London (UK)

Editor’s note;

John Watkinson has a new book readers may wish to view.

In The Art of Flight, John Watkinson chronicles the disciplines and major technologies that allow heavier-than-air machines to take flight. The book is available from Waterstones Book Store.

Other related articles posted on The Broadcast Bridge.

You might also like...

Standards: Video - High Efficiency Video Coding (HEVC)

Designed to halve the bitrate of AVC while supporting resolutions up to 16K, HEVC represents a significant leap in video coding efficiency. This guide explores its profiles, tiers and levels, and examines whether it can overcome the challenges of entrenched…

Production–Delivery Convergence: Part 6 - Designing Experiences That Viewers Trust

Performance reliability is an invisible contract between a streaming service and its customer, and it is fundamental to guaranteeing viewer retention. The problem is that performance isn’t just about delivery. Here we identify where to look and why it’s c…

SMPTE Education Launches Summer 2026 Lineup Of IP And ST 2110 Courses

Boasting two standalone courses, an intensive boot camp, and a hands-on practical lab, SMPTE Education has launched its summer 2026 Lineup of IP and ST 2110 Courses.

Virtual Production For Broadcast: Principles, Terminology & Technology

The technology and techniques of virtual production, from the camera back through the video wall, processors, and rendering servers.

Standards: Video - Advanced Video Coding (AVC)

AVC remains one of the most widely deployed video codecs in the world, but navigating its profiles, levels and signaling mechanisms is far from straightforward.