Machine Learning (ML) For Broadcasters: Part 3 - Neural Networks And The Human Brain

Machine learning is often compared to the human brain. But what do they really have in common?


Other articles in this series:


Some of the terminology we use supports the notion of ML being like the human brain. For example, the brain, at a very simple level, consists of billions of interconnected neurons. Also, humans learn a large part of their behavior through supervised learning or learning with some sort of feedback – parents call it discipline.

As an example, we do not teach a child how to cross every road in the world as this would simply be impossible. Instead, we provide them with a strategy to cross a generic road, and with the appropriate training and expansion in learning, this knowledge can be used to cross all roads. One of the reasons this strategy is successful is that roads all follow a similar pattern, that is, vehicles approach from the left and right, or we have specific pedestrian crossings. We don’t expect a bus to drop from the sky, or a helicopter to land on the freeway.

In other words, a human only needs to learn a subset of all possible outcomes to achieve a function as the natural world behaves in a predictable fashion. Well, at least usually. Admittedly, this is a very broad statement that has more holes in it than the proverbial sieve, especially if we take it to the extreme. However, we can say that as most people cross the road successfully, then the training we provide for children on how to cross the road is largely successful, but not 100% successful. And this is where some of the confusion of what ML and NNs can achieve, and what they cannot, lies.

Figure 1 - a simple single neuron in a neural network takes inputs (x1 … x3) and modifies the weights (w1 … w3), which perform a multiplication function, then adds the bias, which performs an addition or subtraction, which in turn provides an output. The sigmoid function applies a non-linearity to the neuron to encourage the output to tend to a larger or smaller value, this is similar to how a neuron in the human brain “fires” a signal.

Figure 1 - a simple single neuron in a neural network takes inputs (x1 … x3) and modifies the weights (w1 … w3), which perform a multiplication function, then adds the bias, which performs an addition or subtraction, which in turn provides an output. The sigmoid function applies a non-linearity to the neuron to encourage the output to tend to a larger or smaller value, this is similar to how a neuron in the human brain “fires” a signal.

Machine learning is analogous to human learning in that we don’t teach a machine how to deal with every possible data point that can be applied to it, but instead use a subset of data to teach it a generic understanding of the relationships between the input data and required output. This is possibly the most important aspect of machine learning, especially when applied to neural networks (NN).

An example of this is in object recognition. Convolution Neural Networks (CNNs) can be used to detect objects, such as a bus. When training the CNN model, we do not train it to detect every possible bus, with every possible color, at every possible aspect, in every possible lighting condition. Instead, we train the model on how to detect a generic set of busses as they share similar attributes, that is they are box shaped, big, have wheels, etc. A baby will not be able to detect a bus, but a child going to school for the first time will be (assuming the parent has provided the appropriate training).

It's important to note that the comparison of the human brain with ML NN is a tentative one and only really meant for illustrative purposes. Nobody (at least those in the know) would ever try and compare their NN model to a human brain, but there are some interesting similarities. Which isn’t surprising as the early ML NN pioneers are human.

NN models consist of weights, biases, and non-linear functions such as the sigmoid or tanh function. When we speak of training an NN, what we are really doing is changing the weights and biases of the interconnections within the model to detect patterns in the input data. When this has been sufficiently achieved, the model is said to be trained. Then we can use the trained model with input data it hasn’t seen before to provide an output. For example, if we train a CNN model to detect all buses using object recognition with a dataset of 100,000 images of different busses, then we should be able to apply any image data to the model and it will be able to detect all buses in the image.

This data-led learning is analogous to the workings of the human brain. In the same way a parent will teach a child through repeated learning, then the CNN model is being trained by providing continuous labeled data. We say to the CNN model “this is an image of a bus”, and after a period of repeated application of the tens of thousands of images showing buses, it will modify its weights and biases so that all buses will be able to be detected. Any parent who has sat down with a child and taught them how to recognize objects such as “this is an apple, and this is an orange” will understand the basis of ML NN learning. 

Figure 2 - when the single neurons from Figure 1 are combined, they form a complex network which is capable of learning patterns from a dataset. A typical ML neural network can consist of tens of thousands, and even millions, or neurons.

Figure 2 - when the single neurons from Figure 1 are combined, they form a complex network which is capable of learning patterns from a dataset. A typical ML neural network can consist of tens of thousands, and even millions, or neurons.

Strangely, the child having recognized the apple for the first time may not be able to detect it when it is turned upside down. In effect, they need more data to train their neurons, and this is exactly what happens in machine learning. We must constantly provide more training as it becomes available. Consequently, the learning of the model is never complete and can never achieve 100% accuracy, but there again, no system can. Gaussian distribution models demonstrate this.

This form of repetitive learning is intrinsic in everything we do. Parents, teachers, and influential people in our lives provide a method of feedback when we’re learning, which in effect updates our neurons. In a similar fashion, ML uses a method called backwards propagation to reduce a loss function. The role of the data scientist is to design a model that reduces the difference between the MLs prediction of the training dataset and the actual labelled data from the training dataset. The backwards propagation updates the models’ weights and biases to reduce the loss value, and hence, make the model more and more accurate.

The way we train and use ML certainly has some similarities to how the human brain works (at a very basic level), and how it trains. However, just like humans, the training process of the ML model is never complete as there is always something new to learn. Consequently, vendors will be updating their ML engines on a regular basis (or at least should be) as new training data is made available.

Nothing in life is certain, but many events are highly probable. It is these highly probable events that allow both humans and ML NNs to learn, and then form highly accurate predictions based on their prior learning experience. Here we run the risk of disappearing down a philosophical rabbit hole especially when we consider training bias, and just like humans, ML runs a great risk of running into training bias issues. But what happens with improbable events? These are called anomalies, and just like in life, they’re really difficult to deal with.

In the next article in this series, we will dig deep into the NN training process and understand exactly what is going on at an engineering level.

You might also like...

Is On-Board Recording Media A Dying Breed?

With the advent of camera-to-Cloud recording, will in-camera recording media be relegated to the dust bin of history alongside the Jaz Drive and the Sony Memory Stick? It could soon well be the case, but for it to happen, The…

Scalable Dynamic Software For Broadcasters: Part 1 - Introduction

IP has succeeded in abstracting away the media essence from the underlying transport stream, and in doing so is providing scalable and dynamic solutions that are facilitated through cloud technologies and software.

Productive Cloud Workflows - Part 1

IP is an enabling technology that facilitates the use of data centers and cloud technology to power media workflows. The speed with which COTS (Commercial Off The Shelf) hardware can now process data means video and audio signals can be…

Mass Audience Broadcasting To Mobile With 5G Broadcast - Part 2

In the last article we looked at why TCP/IP internet delivery is incredibly difficult to scale and how 5G-NR can overcome its limitations. In this article we dig deeper into 5G-NR to understand why it is such a powerful…

Broadcast Audio Workflow: Part 2 - Entertainment With An Audience

We continue our discussion of broadcast audio workflow with multi-award winner Robert Edwards. We look at the many challenges that come when a live audience is added to the broadcast mix.