Machine Learning (ML) For Broadcasters: Part 1 - Overview

Machine Learning is generating a great deal of interest in the broadcast industry, and in this short series we cut through the marketing hype and discover what ML is, and what it isn’t, with particular emphasis on neural networks (NNs).

Other articles in this series:

It’s important to remember that ML is a vast subject that is still the subject of deep research. Even while writing these articles, the LSTM (Long Short-Term Memory) NN used in time series analysis type applications is seeing a lot of competition from Transformers. Consequently, much of the terminology is still open to interpretation and change, especially in the general population.

ML is a generic term that encompasses a whole range of mathematical tools including Naïve Bayes, Random Forest, and K-Means, all providing solutions to different applications. But of particular interest to broadcasters are NNs as prediction and classification, which are two of the areas where NNs excel. Applications in prediction and classification include video compression, QC, library ingest meta data tagging, and IP network optimization.

Key to understanding ML requires an appreciation of tools such as regression analysis. Although this may be a grandiose title, it’s really a statement on establishing relationships between a dependent variable, or the output (the variable we want to predict or classify), and the independent variables, or input data. A simple example of this is shown in Figure 1 where graphically we can easily see a delineation between the two classifications of data points. In this example, the “plus” datapoints may represent videos that have passed QC and the boxes have failed QC.

Figure 1 – both diagrams represent a simple linear delineation for classification between the two types of datapoints.

The left diagram easily demonstrates a simple linear relationship of the type y = mx + c, and the diagram on the right is still linear as it represents a parabolic relationship of the type y = ax²+ bx + c. However, Figure 2 is anything but linear and the delineation would be incredibly challenging to find as it doesn’t fit a standard linear equation model.

In a classification example, where we want to find a pass or fail outcome, Figure 1 demonstrates the relationship would be relatively easy to find, although drawing the delineation line isn’t trivial. However, finding the relationships in Figure 2 is almost impossible using linear regression. So, we turn to ML, and specifically NNs to find the non-linear relationships demonstrated in Figure 2.

Figure 2 – two examples of non-linear relationships between the input data and the delineation for the classified output data.

Quite often, linear regression relies on statistical analysis to find the equation we’re looking for to predict the output based on the independent input data. The major difference between statistical analysis and ML is how we use the data. In statistics the data scientist will analyze the data to initially find equations that meet simple linear relationships between the input and the output. As the models become more complex and the patterns harder to find, the data scientist adopts a whole new array of mathematical tools. The point here is that the analysis of the data, and hence the design of the statistical model, relies almost entirely on the skill of the data scientist and their ability to find the relationships between the data and the predictions. In other words, they must be domain experts.

ML differs greatly from statistical analysis as the model “learns” patterns within the data set based on generalized models. These include MLPs (Multi-Layer Perceptron’s), RNN (Recurrent Neural Networks), LSTMs (Long Short-Term Memory) and GAN (Generative Adversarial Networks), to name but a few. The basic concept is that we apply “training” data to the models and through some fairly involved mathematical processes the parameters within the networks are optimized so that when the input is presented to the network, the output meets the prediction. More fundamentally, the data scientist does not need to be a domain expert in the field they are working in. It’s fair to say that we still need a domain expert to classify the data, but this would be somebody such as a QC engineer who can pass or fail the images. However, the data scientist building the complex ML models treats the video and its QC classification as data. They don’t need to understand where the boundary of pass and fail exists, only that it does, and that they design and train a model to provide the desired classification or prediction.

When we talk about ML it covers two distinct processes, the training, and evaluation. Training is the process of teaching the model to find the patterns within the training dataset that matches the desired output. After the model is trained, previously unseen data is presented to the model during the evaluation phase which in turn provides a pass or fail output.

Training is based on presenting tens, or even hundreds of thousands of datapoints so that the parameters within the model can be automatically adjusted. These same parameters are used during evaluation so that the model can now provide a classification prediction on previously unseen data. Later in this series we delve deeper into the fundamentals of training and evaluation to demonstrate how complex this process really is.

Although classification is used extensively by ML to tag images and sound, ML also comes into its own when we start considering prediction, specifically in compression and standards rate conversion. If an ML model can predict the next frame of video, or sequence of frames of video, then we have an incredibly powerful tool. We no longer need to be concerned with motion compensation as the model will be able to predict the next pixel values. And a similar argument applies to video and audio compression.

One of the interesting aspects of ML is that the training data implies the solution is based on past experiences. This is similar to the operation of the human brain, and this is one of the reasons pundits draw comparisons to the workings of the mind. ML doesn’t pretend to replace the human brain in any way, but it does replicate the methods with which we all learn. In later parts of this series, we look at the importance of training data and how confirmation bias can affect the quality of the ML solution. ML is all about the training data!

Other related articles posted on The Broadcast Bridge.

Machine Learning (ML) For Broadcasters: Part 2 - Applications

You might also like...

NAB Show 2024 BEIT Sessions Part 2: New Broadcast Technologies

The most tightly focused and fresh technical information for TV engineers at the NAB Show will be analyzed, discussed, and explained during the four days of BEIT sessions. It’s the best opportunity on Earth to learn from and question i…

Standards: Part 6 - About The ISO 14496 – MPEG-4 Standard

This article describes the various parts of the MPEG-4 standard and discusses how it is much more than a video codec. MPEG-4 describes a sophisticated interactive multimedia platform for deployment on digital TV and the Internet.

Chris Brown Discusses The Themes Of The 2024 NAB Show

The Broadcast Bridge sat down with Chris Brown, executive vice president and managing director, NAB Global Connections and Events to discuss this year’s gathering April 13-17 (show floor open April 14-17) and how the industry looks to the show e…

Essential Guide: Next-Gen 5G Contribution

This Essential Guide explores the technology of 5G and its ongoing roll out. It discusses the technical reasons why 5G has become the new standard in roaming contribution, and explores the potential disruptive impact 5G and MEC could have on…

Audio For Broadcast: Cloud Based Audio

As broadcast production begins to leverage cloud-native production systems, and re-examines how it approaches timing to achieve that potential, audio and its requirement for very low latency remains one of the key challenges.