Digital Audio: Part 8 - Sampling Rates

The best sampling rate for digital audio is easily established by considering the requirements of the human auditory system (HAS), which is the only meaningful arbiter. Provided that the bandwidth of a digital audio system somewhat exceeds the bandwidth of the HAS, that should be enough.

Human hearing has been studied and measured extensively over the years by audiologists and its characteristics are not in dispute, at least not by qualified audiologists. Repeatable experiment shows that the frequency range of human hearing does not exceed 20kHz.

For broadcast purposes the upper frequency could be reduced to 15kHz and most listeners wouldn't notice. The reason is that only the very young can hear 20kHz and the maximum frequency that people can detect falls with age. A sampling rate of 32kHz was adopted for many broadcast purposes.

All of the sampling rates adopted in the defining era of digital audio were based on those figures. A bandwidth of 20kHz requires a sampling rate of a little more than twice that to allow the use of filters having a finite slope. Something in the region of 44-45khz would do nicely according to sampling theory. In the end it was the need to synchronize the sampling rate with video-based production equipment that led to the adoption of 44.1kHz for the Compact Disc.

It had long been the practice in professional audio to change the pitch of a recording by altering the speed of a tape recorder, and some digital audio recorders could also be made to play at variable speed. Obviously the sampling rate would change in proportion to the speed. A difficulty existed if the speed was reduced, because the fixed filters in the DAC would not reject the lower sampling sideband if the sampling rate were to be reduced too much.

This led to the adoption of a higher sampling rate for production purposes. Fig.1 shows that by keeping the same filtering but using a slightly higher sampling rate, fixed filters would still work if the speed is reduced. Initially 50.4kHz was suggested, as this had a fractional relationship (8/7) with 44.1kHz as it was thought that conversion between the two rates would be easier. However, once it was found that conversion between arbitrary rates was possible 48kHz was adopted as the standard for production.

Subsequently a couple of advances changed the rules. Digital tape died out and digital processors performed pitch changing. Oversampling in DACs was widely adopted. In an oversampling convertor, the filtering is in the interpolator, whose frequency response is proportional to the clock rate. In a pitch changing application, the response of the interpolator would remain proportional to the sampling rate. If all DACs adopted oversampling, 48kHz would not be necessary, but it worked, it was practical and it was retained.

48kHz worked very nicely with 50Hz Television standards, offering an integer number of samples per field, but with 60Hz systems it was not so easy, because the field rate was only nominally 60Hz and in fact owing to the peculiarities of NTSC it was 59.94 Hz, 0.1 percent low. The result is that in 60 fields there are 48,048 samples, not 48,000 and this figure will not divide by 60 without a remainder. Where 48kHz audio is recorded along with 59.94Hz video, it is necessary to change the number of samples per field in a pattern.

Over five fields, the first four contain 801 samples and the last contains 800 samples making 4004 samples per five fields, which corresponds to 48,048 samples per 60 fields.

In the UK the BBC developed a digital audio system called NICAM that was used to distribute sound from their studios to the various FM radio transmitters scattered around the country. The system was introduced in the 1970s and adopted a sampling rate of 32kHz with 14-bit resolution and a very mild form of compression. It was highly successful as it eliminated the quality loss of analog landlines and ensured that the quality reaching all FM transmitters was the same, no matter where they were located. The transmitters, of course continued operating as analog devices.

Fig.1 In a conventional DAC, shown at a), a fixed low-pass filter removes the sampling images and passes the baseband. At b), if the sampling rate is reduced in order to lower pitch, some of the image spectrum can pass the fixed filter. If the same filters are used with a higher sampling rate, as in c), the image does not pass the filter when the pitch is reduced.

The subsequent launch of the Compact Disc served dramatically to improve the quality of pre-recorded sound for the consumer, but it also served to highlight the combination of technical ignorance and gullibility of the hi-fi journalist. One famous British journalist alleged that a waveform made out of bricks could not possibly sound as good as a traditional analog signal. He said that if CD replaced vinyl disks he would have to go back to listening to FM radio.

This journalist preferred listening to FM radio where the signal had reached the transmitter via a 32kHz 14-bit companded digital audio system, over a 44.1kHz 16 bit uncompressed Compact Disc. It is hard to give any credibility to remarks of that kind and harder to see how he could have come to such a conclusion. Unless there was something amiss with his hearing, no actual comparison had been made and he was most likely just airing a prejudice.

Subsequently modulation schemes were developed that allowed digital bit streams to be broadcast, and one result was Digital Audio Broadcasting (DAB). Oddly, a sampling rate of 48kHz was chosen, which served no purpose, as there was no requirement for variable speed playback. Instead the excessive sampling rate served to increase the compression factor required and in most implementations DAB delivered significantly worse sound quality than the long-serving FM system. Claims that DAB delivered CD quality had to be withdrawn.

DAB+ was subsequently developed and the sound quality was much better because a more appropriate bit-rate-reduction codec was used. Unfortunately it also made the earlier DAB receivers obsolete.

One of the tenets of hi-fi is that more is always better, and over-specification is the name of the game, where everything has to be gold-plated and snake oil abounds. Unsurprisingly we were told that the sampling rates of 44.1 and 48kHz were inadequate and that higher rates were necessary. We were also told that the difference was audible. Those who told us had no qualifications.

It is difficult to say how many people have their hearing tested in the course of a year, but if an audiologist came across someone who could hear significantly higher frequencies than the rest of mankind we should have learned about it. Yet no such discovery has been made and so there is no psychoacoustic basis for absurd sampling rates. Not only that, but also very few sound sources can produce significant sound energy beyond 20kHz. Escaping steam is a notable exception, but one can only listen to so much of that.

Tragically for the enthusiast, the atmosphere in which we live does not meet the requirements of high-end hi-fi. Frequency dependent absorption of sound means that air does not have a flat frequency response, but instead absorbs sound as the square of the frequency. It is quite probable that human hearing evolved to work over the most useful frequency range for the distances normally encountered and that sounds above 20kHz were not beneficial. Smaller animals would expect to listen over shorter distances hence the extended frequency range of dogs, cats and bats.

In listening rooms, air absorption causes the reverberation time to fall dramatically at ultrasonic frequencies as the sound is absorbed instead of bouncing around.

Acousticians know that as frequency rises, all radiating sources become more directional. It's also true of microphones. When making a recording, the narrowing directivity of the source reduces the chances of ultrasonic sound being radiated towards the microphone. Equally the chances of the sound falling within the reduced acceptance angle of the microphone go down. When listening to that recording, the chances that the loudspeaker will be able to radiate in the direction of the listener will fall. The listener will not receive significant reverberant ultrasound because of absorption.

Even if the HAS could hear such frequencies, a sound reproduction system in which the sampling rate of the digital part was dramatically raised achieves nothing, because of the characteristics of the sound source, the transducers and the air. The HAS has not evolved to hear such frequencies, because in real life they are not presented to the ear.

The definitive test for the audibility of ultrasound is to make some recordings at an absurd sampling rate and then selectively to pass them through a low pass filter that is designed to be perfectly phase-linear such that its only effect is to reduce the bandwidth to a normal amount. If the filter is switched in and out, no one is able to tell whether it is in circuit or not.

Nevertheless over-specified sampling rates continue in use because enthusiasm is all that is necessary to choose them. Enthusiasm does not require any technical knowledge, or any ability to conduct a statistically significant experiment.

The most interesting and fascinating aspect of hi-fi has nothing whatsoever to do with electronics or acoustics. Instead it is instructive to understand the psychology of why people entertain beliefs that have no basis in fact and even cling to them more strongly when they are repudiated. There seems to be a closed circle where journalists write anything they think that their readers will believe irrespective of what the laws of physics say and the readers do believe it.

Believers spend vast sums on equipment and accessories whose only benefit is to give them something to fiddle with on a rainy day. They have that right.

You might also like...

Microphones: Part 11 - The State Of The Art… And The Potential Of MEMS Microphone Arrays

Here we look from the state of the art in microphones, to what the future may bring with the enticing theoretical potential of microphone arrays built using MEMS technology.

Microphones: Part 10 - Mid-Side (M-S) Recording And Processing

M-S techniques provide useful sound-field positioning and a convenient way to check mono compatibility. We explain the hard science behind this often misunderstood technique.

Microphones: Part 9 - The Science Of Stereo Capture & Reproduction

Here we look at the science of using a matched pair of microphones positioned as a coincident pair to capture stereo sound images.

Microphones: Part 8 - Audio Vectorscopes

The audio vectorscope is an excellent tool for assuring quality in stereo sound production, because it makes the virtual sound image visible in the same way that a television vectorscope allows the color signals to be seen.

Microphones: Part 7 - Microphones For Stereophony

Once the basic requirements for reproducing sound were in place, the most significant next step was to reproduce to some extent the spatial attributes of sound. Stereophony, using two channels, was the first successful system.