Digital audio relies completely on the accuracy of quantization and it is important to see how it works.
To get from a continuous waveform to a set of numbers requires two processes, sampling and quantization. One works in the time domain and the other works in the Voltage domain, so they are orthogonal, meaning that what happens in one doesn't affect the other, a bit like Cr and Cb in video.
The orthogonality means that sampling and quantizing can be performed in either order and the outcome will be the same provided the processes are well engineered. The choice of the order is an engineering decision.
Fig.1 shows the idea. At a) the input waveform is first sampled into discrete time pulses or needles in a pulse-amplitude modulated (PAM) waveform. Each pulse is then quantized to create a binary number. At b) the quantization takes place continuously with the signal being allocated to the nearest step whenever there is an appropriate change of Voltage. The state of the steps is then sampled in discrete time.
In some cases the conversion process is more complicated because there may be multiple sampling and quantizing stages that are interleaved. Those ideas will have to wait for a future piece.
The entire purpose of quantizing is so that the audio waveform can be described by discrete values. Once the information is discrete, techniques such as error correction and time base correction can be used so that the sound quality becomes independent of the medium used to store or carry the discrete information.
Fig.1 - Sampling and quantizing can be performed in either order. At a) the sampling goes first. At b) the quantizing goes first.
To put it another way, if the audio signal is converted to a data file, that file is a series of numbers. If those numbers arrive somewhere else or are reproduced later but without any change, there has been no loss of quality due to the storage or transmission. If the quality of digital audio is independent of the medium, what determines it? Very simply, the bandwidth available follows from the sampling rate used, and the signal to noise ratio available follows from the wordlength of the samples.
Quality in digital audio is determined at the first conversion stage, which is why ADCs have to be of adequate quality. If a good ADC is used to make a recording that is monitored with a poor DAC, the later substitution of a good DAC will reveal how good the recording actually was. If a poor ADC was used, the recording is permanently poor.
Fig.2 shows that a continuously variable signal is analogous to a ramp. One can stand in an infinite number of places on the ramp and be at an infinite number of heights. In contrast, standing on a ladder is discrete; effectively a ladder is a quantized ramp.
Another goal of digital audio is that it should be possible to perform signal processing on the audio data. That requires that the binary value of a sample should be proportional to the original audio voltage. The requirement is met using a mid-tread quantizer transfer function shown in Fig.3.
Fig.2 - A continuously variable system can be compared to a ramp, where any height is possible. A quantized system is more like a ladder, where only certain heights are available.
The staircase-like characteristic replaces the linear transfer function and the difference between the staircase and the straight slope is a sawtooth as shown in Fig.4. The steps are separated by quantizing intervals, Q. The slope and the staircase coincide at the center of each tread. Moving away from the coincidence point, the quantizing error steadily increases until the next quantizing interval is entered. At that point the quantizing error changes polarity and starts to fall towards the next coincidence.
A change of one quantizing interval results in the binary code changing by one, which means the least significant bit changes state. For that reason some writers call a quantizing interval an LSB, and then proceed to talk about fractions of an LSB. This is nonsense: a bit is 1 or 0 and there is no such thing as a fraction of a bit. In contrast, a quantizing interval is a Voltage range, and can be subdivided.
Within the range of the quantizer, the quantizing error cannot exceed 1/2 Q. Clearly if the range is exceeded, the quantizing error becomes unbounded, which is another way of saying the signal has been clipped. Unlike earlier audio devices or media like vacuum tubes or magnetic tape, which had a relatively gentle onset of clipping, in digital audio the onset of clipping is abrupt.
Fig.3 - The mid-tread quantizer makes the numbers proportional to the input voltage because zero digital corresponds to zero Volts.
In practice the difference is relatively unimportant. The characteristic of human hearing is such that harmonic distortion has to be sustained in order to be audible. Distortion could not be heard on transients recorded in the non-linear headroom region of traditional magnetic tape recorders and nor can it be heard if the same transients are clipped by over-modulating a digital system. On the other hand digital systems offer so much dynamic range that clipping is avoidable.
As with any medium, the best results are obtained when the largest signal is recorded, subject to it not clipping. Such a signal is defined in digital audio as having a level of zero dBFS. Quantizing error can be thought of as an unwanted signal added to the wanted signal. With a real complex audio waveform at high level, the quantizing errors from one sample to the next simply don't correlate and the unwanted signal is random and noise-like.
Unfortunately the same is not true when the audio waveform is at low level and especially if it is a simple waveform, like a sine wave. Fig.5 shows that under those conditions, the quantizing error becomes a function of the audio signal, in other words it is distortion. Fig. 5 shows that the quantizing error can be worked out from the waveform.
Imagine the worst case where the input waveform is so small that it only manages to switch between two quantizing intervals. Whatever the waveform, the output is a square wave, rich in harmonics. The problem is that those harmonics may be above the Nyquist frequency and fold back into the base band as anharmonics.
Fig.5 - With small signals, the quantizing error is a function of the audio waveform and must be classed as a distortion.
But fear not, because that only happens with an ideal quantizer in a text book, and digital audio cannot and does not use ideal quantizers. Instead all practical audio convertors use a technique called dithering, which has a number of beneficial effects. The first of these is that the digital audio system remains perfectly linear at all signal levels, however small. The second is that a digital audio system is given a noise floor whose level corresponds to the information capacity of the samples.
Fig.6 shows how dither is a noise-like signal that is added to the audio waveform prior to conversion. Purists object that adding noise to a signal cannot be right, until they are reminded that in a decent digital system the noise added is probably below the inherent noise present on the input signal and so makes no difference.
Linearity is paramount in audio. In digital audio in addition to the correct use of dither it is also important that all of the quantizing steps are of exactly the same size. That's a tough specification to meet, because, for example, in a 16 bit system there are 65,536 steps and the precision required to get them to be the same is non-trivial.
Conventional techniques can be used for testing convertors, but the fact that an ADC produces data means that numerical methods can also be used. If a sine wave is input to a DAC, the rate of change of Voltage can be precisely calculated. The higher the rate of change, the less time the signal spends in a particular quantizing interval, and the lower is the probability of the code for that interval appearing in the output data.
Fig.6 - Practical audio convertors use dither to linearize the system by making the quantizing error random.
This linearity test relies on a probability distribution being built up (by simply counting) showing the probability of each code appearing at the output. If the probability function corresponds to the theoretical function predicted, the convertor is blameless. If, however, one code occurs more often than it should, that quantizing interval is too big.
The first time I heard about this technique was in a proposal made by Malcolm Hawksford, when we were both young.
Even an ideal convertor can under perform if it does not receive an accurate clock. If a rapidly rising audio signal is being sampled, if the clock is too early, the sampled value will be too small. Too late and it will be too large. On the other hand if the audio waveform is relatively constant, as at the top of a cycle, the timing error doesn't matter.
The error introduced by clock jitter is thus program modulated noise, where the noise level is a function of the slope of the input waveform. To avoid this problem, convertor clocks need to be built to very tight specifications. Heavily damped phase locked loops are needed, especially if the data have arrived on some sort of interface where noise could have de-stabilized the timing without affecting the sample values.
Broadcast Bridge Survey
You might also like...
Signal transducers such as cameras, displays, microphones and loudspeakers handle information, ideally converting it from one form to another, but practically losing some. Information theory can be used to analyze such devices.
We begin our series on things to consider when designing broadcast audio systems with the pivotal role audio plays in production and the key challenges audio presents.
The transform is a useful device that has some interesting characteristics. On one side of a transform we might have the spatial domain, for example data describing an image in terms of brightness as a function of position. On the…
Wind turbines are increasing in number because they produce electricity with reasonable environmental impact. But how green are they really?
In an increasingly digital world, it may be useful to look at the amount of data needed to represent various media. Note that the amount of data is being considered; the amount of information will always be somewhat less than…