Machine Learning (ML) For Broadcasters: Part 3 - Neural Networks And The Human Brain

Machine learning is often compared to the human brain. But what do they really have in common?


Other articles in this series:


Some of the terminology we use supports the notion of ML being like the human brain. For example, the brain, at a very simple level, consists of billions of interconnected neurons. Also, humans learn a large part of their behavior through supervised learning or learning with some sort of feedback – parents call it discipline.

As an example, we do not teach a child how to cross every road in the world as this would simply be impossible. Instead, we provide them with a strategy to cross a generic road, and with the appropriate training and expansion in learning, this knowledge can be used to cross all roads. One of the reasons this strategy is successful is that roads all follow a similar pattern, that is, vehicles approach from the left and right, or we have specific pedestrian crossings. We don’t expect a bus to drop from the sky, or a helicopter to land on the freeway.

In other words, a human only needs to learn a subset of all possible outcomes to achieve a function as the natural world behaves in a predictable fashion. Well, at least usually. Admittedly, this is a very broad statement that has more holes in it than the proverbial sieve, especially if we take it to the extreme. However, we can say that as most people cross the road successfully, then the training we provide for children on how to cross the road is largely successful, but not 100% successful. And this is where some of the confusion of what ML and NNs can achieve, and what they cannot, lies.

Figure 1 - a simple single neuron in a neural network takes inputs (x1 … x3) and modifies the weights (w1 … w3), which perform a multiplication function, then adds the bias, which performs an addition or subtraction, which in turn provides an output. The sigmoid function applies a non-linearity to the neuron to encourage the output to tend to a larger or smaller value, this is similar to how a neuron in the human brain “fires” a signal.

Figure 1 - a simple single neuron in a neural network takes inputs (x1 … x3) and modifies the weights (w1 … w3), which perform a multiplication function, then adds the bias, which performs an addition or subtraction, which in turn provides an output. The sigmoid function applies a non-linearity to the neuron to encourage the output to tend to a larger or smaller value, this is similar to how a neuron in the human brain “fires” a signal.

Machine learning is analogous to human learning in that we don’t teach a machine how to deal with every possible data point that can be applied to it, but instead use a subset of data to teach it a generic understanding of the relationships between the input data and required output. This is possibly the most important aspect of machine learning, especially when applied to neural networks (NN).

An example of this is in object recognition. Convolution Neural Networks (CNNs) can be used to detect objects, such as a bus. When training the CNN model, we do not train it to detect every possible bus, with every possible color, at every possible aspect, in every possible lighting condition. Instead, we train the model on how to detect a generic set of busses as they share similar attributes, that is they are box shaped, big, have wheels, etc. A baby will not be able to detect a bus, but a child going to school for the first time will be (assuming the parent has provided the appropriate training).

It's important to note that the comparison of the human brain with ML NN is a tentative one and only really meant for illustrative purposes. Nobody (at least those in the know) would ever try and compare their NN model to a human brain, but there are some interesting similarities. Which isn’t surprising as the early ML NN pioneers are human.

NN models consist of weights, biases, and non-linear functions such as the sigmoid or tanh function. When we speak of training an NN, what we are really doing is changing the weights and biases of the interconnections within the model to detect patterns in the input data. When this has been sufficiently achieved, the model is said to be trained. Then we can use the trained model with input data it hasn’t seen before to provide an output. For example, if we train a CNN model to detect all buses using object recognition with a dataset of 100,000 images of different busses, then we should be able to apply any image data to the model and it will be able to detect all buses in the image.

This data-led learning is analogous to the workings of the human brain. In the same way a parent will teach a child through repeated learning, then the CNN model is being trained by providing continuous labeled data. We say to the CNN model “this is an image of a bus”, and after a period of repeated application of the tens of thousands of images showing buses, it will modify its weights and biases so that all buses will be able to be detected. Any parent who has sat down with a child and taught them how to recognize objects such as “this is an apple, and this is an orange” will understand the basis of ML NN learning. 

Figure 2 - when the single neurons from Figure 1 are combined, they form a complex network which is capable of learning patterns from a dataset. A typical ML neural network can consist of tens of thousands, and even millions, or neurons.

Figure 2 - when the single neurons from Figure 1 are combined, they form a complex network which is capable of learning patterns from a dataset. A typical ML neural network can consist of tens of thousands, and even millions, or neurons.

Strangely, the child having recognized the apple for the first time may not be able to detect it when it is turned upside down. In effect, they need more data to train their neurons, and this is exactly what happens in machine learning. We must constantly provide more training as it becomes available. Consequently, the learning of the model is never complete and can never achieve 100% accuracy, but there again, no system can. Gaussian distribution models demonstrate this.

This form of repetitive learning is intrinsic in everything we do. Parents, teachers, and influential people in our lives provide a method of feedback when we’re learning, which in effect updates our neurons. In a similar fashion, ML uses a method called backwards propagation to reduce a loss function. The role of the data scientist is to design a model that reduces the difference between the MLs prediction of the training dataset and the actual labelled data from the training dataset. The backwards propagation updates the models’ weights and biases to reduce the loss value, and hence, make the model more and more accurate.

The way we train and use ML certainly has some similarities to how the human brain works (at a very basic level), and how it trains. However, just like humans, the training process of the ML model is never complete as there is always something new to learn. Consequently, vendors will be updating their ML engines on a regular basis (or at least should be) as new training data is made available.

Nothing in life is certain, but many events are highly probable. It is these highly probable events that allow both humans and ML NNs to learn, and then form highly accurate predictions based on their prior learning experience. Here we run the risk of disappearing down a philosophical rabbit hole especially when we consider training bias, and just like humans, ML runs a great risk of running into training bias issues. But what happens with improbable events? These are called anomalies, and just like in life, they’re really difficult to deal with.

In the next article in this series, we will dig deep into the NN training process and understand exactly what is going on at an engineering level.

You might also like...

KVM & Multiviewer Systems At NAB 2024

We take a look at what to expect in the world of KVM & Multiviewer systems at the 2024 NAB Show. Expect plenty of innovation in KVM over IP and systems that facilitate remote production, distributed teams and cloud integration.

Wi-Fi Gets Wider With Wi-Fi 7

The last 56k dialup modem I bought in 1998 cost more than double the price of a 28k modem, and the double bandwidth was worth the extra money. New Wi-Fi 7 devices are similarly premium-priced because early adaptation of leading-edge new technology…

NAB Show 2024 BEIT Sessions Part 2: New Broadcast Technologies

The most tightly focused and fresh technical information for TV engineers at the NAB Show will be analyzed, discussed, and explained during the four days of BEIT sessions. It’s the best opportunity on Earth to learn from and question i…

Standards: Part 6 - About The ISO 14496 – MPEG-4 Standard

This article describes the various parts of the MPEG-4 standard and discusses how it is much more than a video codec. MPEG-4 describes a sophisticated interactive multimedia platform for deployment on digital TV and the Internet.

The Big Guide To OTT: Part 9 - Quality Of Experience (QoE)

Part 9 of The Big Guide To OTT features a pair of in-depth articles which discuss how a data driven understanding of the consumer experience is vital and how poor quality streaming loses viewers.