Information: Part 4 - Data Rates

In an increasingly digital world, it may be useful to look at the amount of data needed to represent various media. Note that the amount of data is being considered; the amount of information will always be somewhat less than that.

Morse Code

Despite its age, Morse Code is surprisingly sophisticated because it uses variable length coding as a form of compression. Morse gathered statistics about the distribution of letters in typical English text and arranged things so that the more common letters, such as e and t were given short codes and the uncommon letters were given longer codes. Morse Code ran at the approximate equivalent of four bits per character.

In addition to alphabetic symbols, Morse Code also grew procedural codes. One of these, three dots, three dashes, three dots became famous as an emergency code. It is similar to the Morse representation of SOS, which is how most people think of it.

Morse could also be used with light signaling. Powerful light sources tended not to like being flashed. Instead a mechanical shutter was used so the light source could be on continuously.

Morse Code was only efficient when used with English. Languages having different distributions of letter probability would take longer to send. Worse than that, before computers, decoding Morse Code was really only possible using a human operator. The printing telegraph was intended to allow directly readable text to be generated at the receiver.

In one successful printing telegraph, the rotation of what would today be called a daisy wheel was synchronized between transmitter and receiver. When the appropriate character was in place, a pulse would be sent to print it. Although it worked, and was used, for example in ticker tape machines, the system was slow because the daisy wheel had to turn through all possible positions in order to print one character.

Baud Rates

The Baudot system used pure binary codes where two states of a bit could be sent electrically or recorded by the presence or absence of a hole in a paper tape. It was found that the 32 combinations of 5 bits was enough for the 26 alphabetical characters with a few left over. The rate at which these symbols were sent came to be called the Baud Rate. Baud Rate and bit rate are only the same if the system uses binary. Where symbols can have more than two states, as in QUAM or VSB, for example, the symbol rate and the bit rate are different.

5 bit alphabetic coding is enough for messages, where upper case only is acceptable, but to account for the use of lower case and to accommodate numbers and characters with accents, more bits would be needed. This led to the development of the ASCII code that needed one byte per symbol.

ASCII survived for a surprisingly long time but was supplanted by UTF-8, which is backward compatible and uses a variable number of bytes depending on how obscure the symbol is. As obscure symbols do not occur very often, a good rule of thumb is one byte per alphanumeric character.

The average length of an English word is thought to be 5 letters. A typical novel might contain 100,000 words, so a very quick calculation reveals that the text alone requires about 500 kilobytes. By modern standards that's not a lot of data, and it explains why one of the first IT products was the word processor, especially as text didn't have to work in real time.

The word processor allowed text to be edited on a screen and only committed to paper when finished. It killed the typewriter.

Digital Audio

The next medium to be digitized was audio. For most consumer purposes a sampling rate of around 44kHz gave adequate bandwidth and a word length of 16 bits produced adequate dynamic range for post-produced material. That meant a stereo digital audio recording needed about 1.4 megabits per second. Unsurprisingly video recorders were adapted for audio purposes as they offered the necessary bandwidth. The Compact Disc of 1982 was a medium that was essentially as good as the consumer's hearing and rather better than most loudspeakers. It killed the vinyl disk.

The bit rate of digital audio was too high for many purposes and the world and his dog set about developing compression codecs many of which sounded as if the dog might have done a better job.

Digital Video

Analog sensors based on CCD technology were developed to replace tubes in TV cameras and it proved possible to digitize the output of the CCD over one frame period to produce a still picture. The first digital still camera was offered by Sony in about 1997 and digitized the NTSC line structure into 640 x 480 pixels. The luma signal alone required about 1/3 of a megabyte and the color information needed about the same.

The Mavica of 1997 provided television resolution, which was grossly inferior to the performance of films of the day, but the image was instantly available. The development process of film had been bypassed and the photographer knew directly if a viable image had been obtained. The writing was on the wall and it was only a matter of time before the performance of digital still cameras would leave film behind, not just in resolution, but in dynamic range and the size of the color space.

The amount of data resulting from a still frame goes as the square of the resolution and huge pixel counts were an inevitable consequence. Fortunately digital storage media such as flash memory increased in performance to meet the demand. The technology killed the film based still camera first and ultimately movie cameras became electronic. Ultimately digital movies were distributed to hard drives in cinemas using encrypted network transmission. That brought film piracy under control.

Ultimately the information bottleneck in movies became the use of projection. In due course projection will be replaced by large light emitting screens.

Standard definition television was the next to be digitized. The sampling rate of 13.5 MHz was adequate for luma and the color difference signal got by with one half that rate in the 4:2:2 format, making the total rate 27MHz. With a word length of 10 bits, the serial digital interface ran at 270 megabits per second. That was the bit rate, but the effective rate was much less because the vertical and horizontal blanking period of the analog signal were still there, times when there was no data. Component SD video needed just over 200 Megabits per second of actual data.

Preparing For Compression

That data rate became relatively straightforward to achieve in rotary-head tape recorders. They missed out the blanking on the tape and reinstated it on playback. The bit rate was far too great for broadcast purposes and digital video would have to wait for effective compression algorithms before it could escape from the production environment.

MPEG-2 was the compression algorithm that made digital television broadcasting and the digital video disk possible. The definition, or strictly speaking the pixel count of television then began to increase. In each case the resulting higher raw bit rate was handled by the introduction of an even more sophisticated compression algorithm to keep the final coded bit rate under control.

The codec designers were aided by the fundamentals of digital moving pictures. Doubling the resolution may quadruple the pixel count, but in real pictures it doesn't quadruple the amount of information. Very high pixel counts are essentially a form of oversampling where something else limits the resolution actually obtainable. With legacy frame rates such as 50 and 60 Hz the amount of information in the frame is limited by motion smear.

Further improvements in television and movie picture quality will require increased frame rates. Again, the information rate is not proportional to the frame rate as there is more redundancy between closer-spaced frames.

Other related articles posted on The Broadcast Bridge.

Information: Part 5 - Moving Images

You might also like...

Comms In Hybrid SDI - IP - Cloud Systems - Part 2

We continue our examination of the demands placed on hybrid, distributed comms systems and the practical requirements for connectivity, transport and functionality.

Standards: Part 6 - About The ISO 14496 – MPEG-4 Standard

This article describes the various parts of the MPEG-4 standard and discusses how it is much more than a video codec. MPEG-4 describes a sophisticated interactive multimedia platform for deployment on digital TV and the Internet.

Chris Brown Discusses The Themes Of The 2024 NAB Show

The Broadcast Bridge sat down with Chris Brown, executive vice president and managing director, NAB Global Connections and Events to discuss this year’s gathering April 13-17 (show floor open April 14-17) and how the industry looks to the show e…

Comms In Hybrid SDI - IP - Cloud Systems - Part 1

We examine the demands placed on hybrid, distributed comms systems and the practical requirements for connectivity, transport and functionality.

Audio For Broadcast - The Book

Audio For Broadcast - The Book gathers together 16 articles into a 78 page eBook which explores the science and practical applications of audio in broadcast. This book is not aimed at audio A1’s, it is intended as a reference resource for …