It should constantly be borne in mind that although digital audio is a form of data, those data represent an audio waveform and there are therefore some constraints on what can and cannot be done to the data without causing audible impairment.
In pre-digital days, the effects of generation loss on audio equipment were severe and production equipment was always over specified so that the signal eventually delivered would still be adequate at the end of the production process. Distribution media such as vinyl disk and cassette tapes had inferior signal to noise ratio to that of good production equipment.
Inevitably when recordings were transferred from a production medium to a distribution medium, there was loss of quality, but the nature of the quality loss was relatively benign as it primarily showed up as an increase in the level of the noise floor, along with a small increase in distortion inherent in the workings of a vinyl disk.
With the advent of digital audio, the distinction between production and delivery quality remained, not because of generation loss, which in digital recording is absent and in digital processing is slight, but because in all recording one is never certain what the input level is going to be and a greater dynamic range in the recording equipment allows more freedom in gain adjustment without risking clipping or noise.
On the other hand the level on a distribution medium can be chosen in production. In digital audio the best quality will be obtained where the signal just fails to clip. Effectively the production process has mapped the dynamic range of the delivery medium to the best part of the production dynamic range as shown in Fig.1. In addition, the production process might also have compressed the dynamic range. As with the previous technology, there must be a potential quality loss associated with the reduced dynamic range of the distribution medium.
There are, however, a couple of significant differences between traditional audio production and the use of digital equipment today. When copying an audio waveform from a traditional production tape deck to a Compact Cassette, the benign increase in the level of the noise floor is automatic, whereas with digital equipment it is not.
The second difference is that any process that alters the audio samples in any way, however slight, always results in word length extension. This will be true for level changes, however small, the use of equalisers, filters, reverberators, mixers, sampling rate convertors and so on. About the only thing that doesn't change the sample values is a delay used for time alignment.
Fig.1 - On the left a wide dynamic range system is used for capture allowing headroom for unpredictable level. After capture there is nothing unpredictable and headroom is not needed on the delivery medium at right. It follows that the production process must somewhere reduce the word length of samples.
Whilst word length extension will be accommodated inside the equipment that is manipulating the samples, that extended word length may need to be dealt with if the audio is to be recorded afterwards, because the new word length will have exceeded the word length of the original signal. In the digital domain the level of the noise floor follows from the word length of the samples. The rule of thumb of -6dB per bit is near enough. The noise floor may, of course, be higher in an actual recording, because of noise on the input signal.
When going from a digital production medium to a distribution medium, or when dealing with word length extension, the noise floor does not automatically increase when a shorter word length is imposed and instead the process has to be arranged deliberately.
In a high quality audio ADC, dither is applied in order to linearize the conversion, which means that the lowest bits of the samples have a random content that gives the system a noise floor that sounds subjectively better than the distortion that results from an un-dithered convertor. If the delivery format level has been optimized as shown in Fig.1, low order bits must be eliminated to shorten the sample word length.
Simply omitting the unwanted bits, a process called truncation, removes the low order bits with the random content and so has the same effect as using an un-dithered ADC. In fact it could be worse because there would at least be some noise on the input signal to an un-dithered ADC that would have some dithering effect, whereas that is completely absent when simply truncating sample values.
In the early days of digital audio, before the need for dither was fully understood, some large condenser microphones established a reputation for sounding especially good when driving audio ADCs, and it was thought that there must be something magical about their design. There was something, but it wasn't magic: the microphones contained vacuum tube amplifiers and the noise level of the tubes was dithering the convertors very nicely.
Simple truncation causes distortion on low-level signals such as the decay of reflections and reverberation at the end of a performance. The distortion takes place in the digital domain after any anti-aliasing filter and the harmonics alias to become anharmonics, which sound subjectively much less pleasing than true harmonics, not least because they do not occur in nature. If a musical note bends up, an aliased harmonic may reduce in pitch, which sounds most peculiar.
Fig.2 - A simple random number generator has uniform probability within a finite field. outside the field the probability shown at a) is zero and the function is rectangular. Two such functions combined have a triangular probability function b). Combining an infinite number of functions produces the Gaussian curve shown in c).
The problem can be avoided by introducing a new random component at the bottom of the new dynamic range using digital dither. Digital dither is a numerical version of noise. It follows that we need to know something about noise so that we can model it.
A Galois field produced by, for example, a maximum-length sequence generator is a pseudo-random series of numbers belonging to a finite field. Fig.2a) shows the probability function of the numbers, which is rectangular: the number is in the field or it isn't and those in the field have equal probability. Those outside the field have zero probability. That doesn't look much like the distribution of audio noise, so more has to be done.
Audio noise from something like a microphone is far more complex. It's the sum of countless processes. In the same way that the output of a filter is the convolution of the input waveform with the filter impulse response, the overall effect of two distributions is obtained by convolving them. Convolution of two functions is performed reversing one function and sliding it across the other. The convolution is the area of overlap.
Where the functions are symmetrical, the reversal can be omitted. Fig 2b) shows what happens if two different rectangular probabilities are convolved. The result is a triangular probability function. If we keep on convolving more rectangular probabilities, we end up with the Gaussian distribution of noise shown in Fig.2c).
Theoretically the Gaussian curve is the result of an infinite number of rectangular probabilities. In practice five or six rectangular probabilities gives a curve that is remarkably close to Gaussian, but we don't need to go that far.
Fig.3 shows how we apply digital dither to a train of audio samples. A random number generator creates digital noise of the appropriate level that is added to the input samples. The random numbers must be in two's complement format, so that positive and negative values are possible and so that the average value can be zero. After the addition of random numbers, the samples are rounded, not truncated. If the bits to be removed add up to less than half a step, they are simply removed. If they add up to more than half a step, the remaining sample value must have one added.
Fig.3 - In digital dithering, random numbers with appropriate probability are added to the input samples and the sum is then rounded up or down to create an output having a controlled noise floor.
If the random number generator is a simple one with a rectangular probability, the audio is dithered and the distortion is prevented, but the noise level isn't constant, but varies with the audio signal. However, if the random number generator has a triangular probability, the noise modulation is eliminated. Adding more randomness to make the noise more Gaussian doesn't improve things. Triangular probability is enough for digital dithering.
The shortening of word length can also be called re-quantizing. Any time the size of the quantizing step is increased, there is the potential for distortion. If there is a real audio signal of appreciable size, the error due to re-quantizing is more likely to be random and the noise is likely to be masked by the signal. Such an approach is used in audio bit rate reduction. In such systems the signal is always companded to the highest possible level so there is always a signal to mask the effects of re-quantizing.
After all production steps have been completed, there is another possibility when it comes to creating the distribution version. Instead of digital dither, which is a broad band process applying a noise floor that is the same at all frequencies, it is possible instead to use noise shaping in conjunction with word length reduction.
Noise shaping measures the error caused by shortening the word length of an individual sample and feeds it back to be added to subsequent samples is such a sense as to minimise the error. However, the feedback loop contains a filter that causes the noise floor to be low at frequencies where hearing is most sensitive, by pushing the noise to the ends of the audio range. It works a treat and allows around a two-bit improvement in the apparent noise performance. But it can only be used once at the end of the production process.
You might also like...
We move on to looking at developments in noise cancelling technology and the role it can play in achieving clarity and comfort within headsets for intercom use.
This is the second instalment of our deep dive into the rapid growth of OTT, high user expectations and the developments in hybrid systems which combine CDN with storage and distributed processing to meet demand.
In the beginning, there was television. And whenever people tried to make television programmes effective video signal monitoring was an essential pre-requisite.
In real systems the issue of sampling rate conversion arises frequently but fortunately there are plenty of solutions.
Broadcasting video and audio has rapidly developed from the send-and-forget type transmission to the full duplex OTT and VOD models in recent years. The inherent bi-directional capabilities of IP networks have provided viewers with a whole load of new interactive…