Information: Part 2 - Gaussian Distribution

Information can never be separated from its nemesis, which is uncertainty. The former is only possible by limiting the latter.

Life is filled with uncertainty and language is filled with words that illustrate the problem. Chances and risks are taken and we are given odds and probabilities to see what may be our lot.

Despite the randomness, the business of chance also obeys some rules and the science of statistics sets out what they are. Accepting that most people will always be unscientific, they nevertheless get along fairly well by making decisions based on intuition or following guidelines that have worked for others.

Unfortunately much of what statistics can teach is counterintuitive and the result is that the unscientific can get things spectacularly wrong remarkably often.

Stephen Jay Gould understated the problem when he wrote:

"Misunderstanding of probabilities may be the greatest of all impediments to scientific literacy".

In fact the misunderstanding of statistics is not just a problem for science; it is a problem at all levels of endeavor. It leads to exploitation, to miscarriages of justice and to squandering of resources against perceived threats whilst actual threats are neglected.

Whatever the words used, it all comes down to probability, which goes from zero to one. Zero probability means something never, ever happens. Unity probability means it happens every time, without fail.

The certainty that follows from being at the ends of the scale is a cause for concern. Seldom is there enough evidence for such certainty. Bertrand Russell, the philosopher, said that people who were certain were certainly wrong. There is a parallel with Gell-Mann's concept of complexity, where the ends of his scale are not complex. This is consistent with the approach of science that what is presently known might have to be adapted in the light of new findings.

Fig. 1 - Probability and odds are just different ways of expressing the same thing.

Fig. 1 - Probability and odds are just different ways of expressing the same thing.

Although it comes down to the same thing, bookmakers prefer odds, which means, for example, that a probability of 0.25 is odds of three to one against. Fig.1 shows a conversion table between probability and odds.

No discussion can be complete without introducing coin tossing. A fair coin must land either heads up or tails up an equal number of times in the long term. To describe the state in which it lands requires just one bit.

It is thought that the die (plural dice) derived from a bone in a sheep's leg that was naturally shaped such that it had four stable states and could be thrown as part of a game. It therefore stopped gamboling and started gambling.

The four states of the sheep bone could be described by two bits as opposed to the one bit for a fair coin. But what are these bits? Are they information? I would say they are not, because they do not reduce any uncertainty and they are not surprising. If a process is known to be random and to have equal probability of reaching any of its possible states, the fact that it arrives at one of those states is not surprising. The relevant bits are data, not information.

The distinction is important, because real sources of information are always constrained by having a random component, without which their information capacity would be infinite. In the case of analog electrical signals, such as from a microphone, the random component manifests itself as a noise floor. 

Fig.2 - At a) a rectangular probability function is shown. Two such functions convolve to form a triangular function b). As more functions are added, the familiar Gaussian curve c) emerges.

Fig.2 - At a) a rectangular probability function is shown. Two such functions convolve to form a triangular function b). As more functions are added, the familiar Gaussian curve c) emerges.

Tossing coins and dice are processes where there is only the random component and no information. The real world is more complicated than that. What we see up here in macroville is typically the sum total of many microscopic mechanisms. Fig.2a) shows a random process that is limited in size, but which has uniform probability. The function is rectangular. If there are two such processes, that are independent but whose results add, we have to use convolution to see what happens.

The two probability functions are slid across one another and the area of overlap is calculated. Fig.2b) shows that the result is a triangular function. If we carry on adding rectangular functions, we end up with a familiar shape known as a Gaussian curve, which is the result of adding an infinite number of functions.

The Gaussian curve shows up in a surprising number of places and it is important to know something about it.

The Gaussian curve has no ends. The parameter it describes can take on large values with very low probability, when most of the individual processes just happen to have the same size and polarity. It is not possible to say when this will happen, only how often.

The noise floor of a microphone is a waveform that cannot be predicted and which has a Gaussian probability of voltage. Although the voltage varies according to the distribution, the average power is steady for a given combination of impedance and temperature. The power of the signal can be compared with the noise power to obtain a signal to noise ratio. This was first understood by Johnson and Nyquist at the Bell Labs in the 1920s.

Fig.3 At a) a noisy signal has a voltage distribution shown here. The proportions of signal and noise at any instant are unknown, so the precise value of the original signal is unknown. At b) a binary signal can suffer bit errors when noise causes the voltage to cross the threshold.

Fig.3 At a) a noisy signal has a voltage distribution shown here. The proportions of signal and noise at any instant are unknown, so the precise value of the original signal is unknown. At b) a binary signal can suffer bit errors when noise causes the voltage to cross the threshold.

The noise waveform is the sum of countless contributions that are independent and can therefore add up or cancel. The same phenomenon is seen in large bodies of water such as the Pacific. At a given place, the water level is the sum of an unimaginable number of waves arriving from various directions in various phases. Most of the time they average out.

Once in a while these waves just happen to be coherent and the result is called a sneaker wave. The combination of low frequency and unpredictability means they are often fatal for people caught on a beach. Those signs along the California and Oregon coasts warning not to turn one's back on the ocean are not kidding.

All practical electronic equipment has finite temperature and finite impedance, from which it follows that all signals must have a noise floor. What this means is that all such signals contain a degree of uncertainty because the ideal, or infinitely accurate signal finds itself added to an unpredictable signal and we can only ever see the sum of the two.

Fig.3a) shows the problem, which is that the signal voltage has a probability distribution. We don't know the magnitude or the polarity of the noise, so the ideal signal could have had a value anywhere within the distribution but we can't say where. It follows that a very large signal is only affected slightly by the noise and can still carry information, whereas a very small signal is essentially randomized by the noise so any information it may have carried is impaired.

In other words the information capacity of a signal is a function of its signal to noise ratio. The relationship was first set out by Claude Shannon. Shannon's work is vitally important, because it follows from the information capacity of an analog signal, i.e. the signal-to-noise ratio, how many bits are needed to quantize it without loss of information.

Fig.4 At a) the distinct distributions due to the two conditions suggest a strong link between them and the outcome. At b) the broad distributions show that something else is affecting the outcome and our assumption about the effect of the applied conditions may not be correct.

Fig.4 At a) the distinct distributions due to the two conditions suggest a strong link between them and the outcome. At b) the broad distributions show that something else is affecting the outcome and our assumption about the effect of the applied conditions may not be correct.

Fig.3b) shows a binary signal that transmits ones and zeros. Ideally, the probability function of the signal voltage would show two narrow peaks, one for 0 and one for 1. In practice, that doesn't happen because of noise. In real systems, the voltages are subject to distributions that are unbounded. This means that noise can have four outcomes. It can drive a 0 or a 1 further from the threshold, so the binary outcome is unchanged, or it can drive a 0 or a 1 across the threshold so that a bit error in created.

It follows from the Gaussian nature of noise that bit errors can never be prevented, they can just be made less frequent by improving the signal to noise ratio. It may be less costly and more effective to use error correction.

One of the features of statistical behavior is that completely independent processes can occasionally produce the same result. This is known as coincidence. When I was at school there was another boy in my class that had the same birthday. I thought it was remarkable then. Now I understand some statistics I know that it's not remarkable at all.

One of the great paradoxes in life is that people who are intellectually lazy seem to save their energy for jumping to conclusions. They see two things happening together and immediately believe that one causes the other. There are two things to know here. Firstly, correlation may be coincidental. Secondly, correlation does not prove causation. The majority of people involved in road accidents are wearing underwear. It is most unlikely that it contributed to the accidents.

When experiments are carried out to see if one thing causes another, we want to avoid unknown effects from biasing the results. Information theory helps here, because randomizing limits information content it can also average out bias.

If our experiment presents two conditions and we see two plainly different results as shown by the distributions in Fig.4a), then our assumption that the condition causes the result is strengthened. But if we obtain the result of Fig.4b), we have to say the connection is weak. Something else is affecting the result and our assumption is not the whole story.

You might also like...

Linear vs D2C: The Future Of Sports Media & Fan Engagement - Part 2

Our sports media COO featured in this article continues to reflect on how the D2C business opportunity drives their decisions about where content is made available, how content is created and produced for different audiences, and how the “D2C…

India Spotlights The Importance of Converged “Direct-To-Mobile” Broadcasting In Today’s Mobile Video

As the U.S. continues to roll out NextGen TV services in markets large and small across the country, 5G wireless technology is being considered (and tested) to augment the OTA signal and provide a fast and accurate backchannel to…

IP Security For Broadcasters: Part 9 - NMOS Security

NMOS has succeeded in providing interoperability between media devices on IP infrastructures, and there are provisions within the specifications to help maintain system security.

Linear vs D2C: The Future Of Sports Media & Fan Engagement - Part 1

This is a story about the COO of a media business, that shines a light on the thinking underway at the leading edge of the media industry, where the balance shift from Linear Broadcasting to D2C Streaming is firmly…

NAB22 BEIT Sessions: ATSC 3.0, Web 3.0, And The Metaverse

What we’ve seen as ATSC 3.0 deploys and develops is just the tip of the NextGen TV iceberg.