Machine Learning is generating a great deal of interest in the broadcast industry, and in this short series we cut through the marketing hype and discover what ML is, and what it isn’t, with particular emphasis on neural networks (NNs).
Other articles in this series:
It’s important to remember that ML is a vast subject that is still the subject of deep research. Even while writing these articles, the LSTM (Long Short-Term Memory) NN used in time series analysis type applications is seeing a lot of competition from Transformers. Consequently, much of the terminology is still open to interpretation and change, especially in the general population.
ML is a generic term that encompasses a whole range of mathematical tools including Naïve Bayes, Random Forest, and K-Means, all providing solutions to different applications. But of particular interest to broadcasters are NNs as prediction and classification, which are two of the areas where NNs excel. Applications in prediction and classification include video compression, QC, library ingest meta data tagging, and IP network optimization.
Key to understanding ML requires an appreciation of tools such as regression analysis. Although this may be a grandiose title, it’s really a statement on establishing relationships between a dependent variable, or the output (the variable we want to predict or classify), and the independent variables, or input data. A simple example of this is shown in Figure 1 where graphically we can easily see a delineation between the two classifications of data points. In this example, the “plus” datapoints may represent videos that have passed QC and the boxes have failed QC.
Figure 1 – both diagrams represent a simple linear delineation for classification between the two types of datapoints.
The left diagram easily demonstrates a simple linear relationship of the type y = mx + c, and the diagram on the right is still linear as it represents a parabolic relationship of the type y = ax2 + bx + c. However, Figure 2 is anything but linear and the delineation would be incredibly challenging to find as it doesn’t fit a standard linear equation model.
In a classification example, where we want to find a pass or fail outcome, Figure 1 demonstrates the relationship would be relatively easy to find, although drawing the delineation line isn’t trivial. However, finding the relationships in Figure 2 is almost impossible using linear regression. So, we turn to ML, and specifically NNs to find the non-linear relationships demonstrated in Figure 2.
Figure 2 – two examples of non-linear relationships between the input data and the delineation for the classified output data.
Quite often, linear regression relies on statistical analysis to find the equation we’re looking for to predict the output based on the independent input data. The major difference between statistical analysis and ML is how we use the data. In statistics the data scientist will analyze the data to initially find equations that meet simple linear relationships between the input and the output. As the models become more complex and the patterns harder to find, the data scientist adopts a whole new array of mathematical tools. The point here is that the analysis of the data, and hence the design of the statistical model, relies almost entirely on the skill of the data scientist and their ability to find the relationships between the data and the predictions. In other words, they must be domain experts.
ML differs greatly from statistical analysis as the model “learns” patterns within the data set based on generalized models. These include MLPs (Multi-Layer Perceptron’s), RNN (Recurrent Neural Networks), LSTMs (Long Short-Term Memory) and GAN (Generative Adversarial Networks), to name but a few. The basic concept is that we apply “training” data to the models and through some fairly involved mathematical processes the parameters within the networks are optimized so that when the input is presented to the network, the output meets the prediction. More fundamentally, the data scientist does not need to be a domain expert in the field they are working in. It’s fair to say that we still need a domain expert to classify the data, but this would be somebody such as a QC engineer who can pass or fail the images. However, the data scientist building the complex ML models treats the video and its QC classification as data. They don’t need to understand where the boundary of pass and fail exists, only that it does, and that they design and train a model to provide the desired classification or prediction.
When we talk about ML it covers two distinct processes, the training, and evaluation. Training is the process of teaching the model to find the patterns within the training dataset that matches the desired output. After the model is trained, previously unseen data is presented to the model during the evaluation phase which in turn provides a pass or fail output.
Training is based on presenting tens, or even hundreds of thousands of datapoints so that the parameters within the model can be automatically adjusted. These same parameters are used during evaluation so that the model can now provide a classification prediction on previously unseen data. Later in this series we delve deeper into the fundamentals of training and evaluation to demonstrate how complex this process really is.
Although classification is used extensively by ML to tag images and sound, ML also comes into its own when we start considering prediction, specifically in compression and standards rate conversion. If an ML model can predict the next frame of video, or sequence of frames of video, then we have an incredibly powerful tool. We no longer need to be concerned with motion compensation as the model will be able to predict the next pixel values. And a similar argument applies to video and audio compression.
One of the interesting aspects of ML is that the training data implies the solution is based on past experiences. This is similar to the operation of the human brain, and this is one of the reasons pundits draw comparisons to the workings of the mind. ML doesn’t pretend to replace the human brain in any way, but it does replicate the methods with which we all learn. In later parts of this series, we look at the importance of training data and how confirmation bias can affect the quality of the ML solution. ML is all about the training data!
You might also like...
As the wider broadcast industry picks up the pace with virtualized, cloud-native production systems we take a look at what audio vendors currently have available and what may be on the horizon.
One cannot get very far with electricity without the topic of batteries arising. Broadcasters in particular have become heavily dependent on batteries to power portable equipment such as cameras and lights.
The Sponsors Perspective: Proactively Monitor IP Video Networks & Essences With Inspect 2110 & PRISM
For over two decades Telestream has streamlined the ingest, production, and distribution of digital video and audio. Today, compared to its SDI/AES-based predecessors, IP video adds exciting new challenges to these workflows.
IP connectivity delivers flexibility and scalability but making the theory work often requires integrated solutions that are adaptable, open, and promote interconnectivity.
The venerable field of audio/visual (AV) packaging is undergoing a renaissance in the streaming age, driven by convergence between broadcast and broadband, demand for greater flexibility, and delivery in multiple versions over wider geographical areas requiring different languages and…