Here we look at some practical results of transform theory that show up in a large number of audio and visual applications.
Articles in this series:
About the simplest shape we can draw is a rectangle, and about the simplest waveform we can create is a square wave, because we can do it with little more than a switch. Transform theory tells us that if on one side of a transform we have a rectangle, on the other side we will have a sinx/x function. These crop up all over the place.
One of the differences between theory and practice is that things that are theoretically possible cannot be done for practical reasons. Whilst sampling theory is applicable to a wide range of problems, within and outside broadcasting, the practical restrictions vary tremendously with the application.
Sampling theory assumes that the sample is instantaneous. In other words in the time domain the waveform is measured over an infinitely short time and in the spatial domain over an infinitely short area. The sampling process results in a pulse amplitude modulated signal.
In order to recover the original signal, a low-pass filter is needed. The well known brick wall filter, having unity gain in the pass band, an infinitely steep cut-off slope and zero gain in the stop band essentially has a rectangular frequency response, meaning that the impulse response will be a sinx/x function.
Fig.1 shows that if the filter response is correctly matched to the sampling rate, the zero crossings of the sinx/x curve coincide with the locations of adjacent samples. In other words the samples do not interfere with each other and the output waveform must join the tops of the samples. This is the theory of perfect reconstruction.
Fig. 1 - The impulse response of a reconstruction filter has periodic zero crossings. These coincide with the sites of adjacent samples, so samples remain independent.
The ideal rectangular filter allows the sampling rate to be only twice the bandwidth of the signal. That is all very well, except that transform duality tells us that the result of an infinitely steep stop band in a filter is an infinite filter window, leading to an infinite delay. The brick wall filter cannot be made and exists only in our imagination. In the real world filters have finite windows and finite slopes. Sampling rates have to rise to avoid aliasing.
In broadcasting we have things easy because the exact time the viewer sees or hears the material is not critical. This means that phase linear filters having a symmetrical impulse response can be used even though they delay the signal. Pity the designer of the fly-by-wire airplane where the delay of such a filter could cause his feedback loops to become unstable.
In audio convertors it is possible to get quite close to the instantaneous sample, whereas in video it is fundamentally impossible. The video signal starts life in a sensor, which is formed of discrete elements known as photosites, loosely called pixels. If those were made according to ideal sampling theory, they would be infinitely small, and would therefore produce no energy.
In all practical image sensors, the photosites are as large as possible in order to maximize the sensitivity of the sensor. In other words, the ideal zero-width pulse of sampling theory has been replaced by a rectangle. Here we go again! Transform theory tells us that the frequency response of the sensor will not be flat, but will instead fall as a sinx/x function. This is a classic aperture effect and with 100 percent photosites, the response will be about 4dB down at the Nyquist frequency (half the sampling rate).
All electronic cameras suffer from this aperture effect and the result is that the actual resolution obtained is not described by the pixel count.
Speaking of the Nyquist frequency, how is aliasing prevented in a camera sensor? In audio, an electrical filter can be used, whereas in a camera the filter must act in the optical domain. Whilst we can have more scope with monochromatic light, broadcast cameras must work over the whole visible spectrum and that makes the filtering harder.
The impulse response of an ideal low pass filter is a sinx/x curve, which has negative excursions that are readily implemented in an electronic filter but impossible in an optical filter where there is no such thing as negative light. It follows immediately that if impulse responses are restricted to those that are positive only, optical anti-aliasing filters cannot be as steep as electronic filters.
These factors together make a strong case for the use of oversampling in cameras. In an oversampling camera, the pixel count at the sensor somewhat exceeds the pixel count of the output format. The finite slope of the anti-aliasing filter is outside the output passband, as is much of the aperture effect due to large photosites.
Aperture effects raise their heads in other types of transducer. The loudspeaker is no exception. Like the photosite in an electronic camera, the loudspeaker diaphragm must have finite area in order to move the air. Fig. 2a) shows such a diaphragm working on-axis. At a distant point, all parts of the diaphragm appear to move together and all is well. However, Fig.2b) shows that if the distant point is off-axis, the impulse response is spread out because the distance to the diaphragm is no longer constant. Transform duality says that broadening the impulse response reduces the frequency response.
The result is that as frequency rises, all drive units suffer from beaming, where only the on-axis sound is accurate. This, of course is why speakers often have several drive units, getting smaller in size as frequency goes up. Unfortunately it is not possible to oppose something continuous with a series of steps, so at each crossover there will be a sudden change in the degree of beaming and the off-axis frequency response is like a dog's hind leg. This is one of the main reasons why traditional loudspeakers sound like loudspeakers instead of like the sound they should be reproducing.
The direct sound may be OK, but the reverberation in the room is, by definition, excited by the off-axis sound and in most loudspeakers the off-axis sound is terrible. There is thus a conflict between the two sounds. In many listening rooms the solution is to install sound absorbing on the walls, at considerable expense, in order to soak up the poor quality off-axis sound. Loudspeakers having adequate directivity performance need no such acoustic treatment.
Transform duality can be seen in action in the column loudspeaker. By making the speaker tall, the sound comes out in a flat beam that directs it towards the audience. By making the column slim, transform duality suggests the sound will emerge in a wide pattern. This is much better suited to a large listening audience.
Fig. 2 - a) All parts of diaphragm deliver sound at the same time to a point on axis. b) When off axis, the impulse response becomes broader and the frequency response narrower.
Microphones suffer from aperture effects too. The larger the diaphragm, the worse becomes the directivity. Since real sounds generally result from some vibrating object, the size of that object will also determine the directivity.
One often reads that existing audio sampling rates are not high enough and that much higher rates would sound better. It may be interesting to examine that in the context of beaming. Let us suppose the sampling rate allows an audio bandwidth of 40Khz. The radiation beam width of a sound source would be half what it is at 20kHz and the beam area one quarter. The same would happen in the microphone and in the loudspeaker.
In order that the listener could be presented with 40kHz sound, the beaming of the source would have to be aligned with the beaming of the microphone and the listener would have to be precisely in the loudspeaker beam. The probability of beaming sound at 40 kHz reaching the listener compared to sound at 20kHz is one quarter cubed, or 1/64. Even if the HAS could detect such frequencies, trying to reproduce them would be pointless.
Sadly for the audiophile, air doesn't meet their requirements of hi-fi equipment, so they can't go to live performances where the highest audible frequencies will never reach them.
Transform theory can tell us many things, but it is unable to explain why audiophiles hold the views that they do. That may be because transform theory is logical. The explanation is that if one's knowledge is sufficiently small, one cannot know why one is in error. The corollary is that if one did know enough to recognize error, the error would not occur. The work of Dunning and Kruger has been very helpful in understanding these phenomena.
Given the number of hearing tests that must be conducted every day, if an audiologist found a patient who could hear 40kHz, it would cause a sensation. It has yet to happen. As usual the blame can be placed at Darwin's door. We have evolved to produce and hear wavelengths that are relevant to living things of our size. Babies evolved to squall at the frequencies to which the HAS is the most sensitive. Smaller living things, such as bats, find shorter wavelengths more relevant.
You might also like...
Having considered all of the vital elements of moving image coding this final part looks at how these elements were combined throughout coding history.
The criticality of service assurance in OTT services is evolving quickly as audiences grow and large broadcasters double-down on their streaming strategies.
Having looked at the traditional approach to moving pictures and found that the portrayal of motion was irremediably poor, thoughts turn to how moving pictures might be portrayed properly.
Quantum Computing is still a developmental technology but it has the potential to completely transform more or less everything we currently assume regarding what computers can and can’t do - when it hits the mainstream what will it do…
At its core, the network-side can be an early warning system for QoS, which in turn correlates to actual QoE performance.