Machine Learning (ML) For Broadcasters: Part 5 - Datasets And GPUs

In the final article in this series, we look at datasets, their importance and why GPUs are critical for machine learning.


Other articles in this series:


In the previous article in this series, we learned that forward propagation is the process of machine learning that facilitates prediction and classification. In mathematical and computational terms this is relatively straight forward process, albeit highly recursive and resource hungry. However, the learning process requires backwards propagation that uses complex mathematical functions to detect the global minima of the model (or function). It is this process that is computationally difficult and benefits greatly from GPU acceleration.

In machine learning, we do not use the GPU to render images, but instead use the hardware accelerated mathematical functions and high-speed memory within it to provide forward and backward propagation. Critically, these GPU processes rely on dividing an array of data into smaller sub arrays to match the GPUs memory map, and then provide processing threads for each sub array. In effect, one processing unit is associated with each sub array allowing thousands of computations to take place simultaneously.

In an image, an array of 1920 x 1080 may be split into 8 x 8 arrays to give 240 x 135 sub arrays. Each one of these would have a processing unit associated with it allowing 32,400 simultaneous parallel processes. If we substitute the pixels in an image for neurons in a neural network, then thousands of neurons can be processed in parallel with their associated data.

Figure 1 – GPUs are used to accelerate machine learning as the thousands of CPUs with associated memory allows for massive parallel processing.

Figure 1 – GPUs are used to accelerate machine learning as the thousands of CPUs with associated memory allows for massive parallel processing.

The GPU functionality is abstracted away from the hardware using libraries such as NVIDIA’s CUDA. NVIDIA provide both the hardware and software, so they are able to highly tune the two leading to massive parallel processing efficiency. The CUDA library is a generic solution that facilitates all kinds of parallel processing from high performance computing found in finance, to image processing found in medical and broadcast.

A further software abstraction takes place using machine learning libraries to provide the necessary models. Pytorch and Keras are two such libraries and deliver convenient interfaces to many of the models needed for machine learning.

A data scientist working to build machine learning solutions spends most of their time preparing their dataset to meet the needs of the Pytorch and Keras models. This allows the models such as LSTMs or CNNs to be standardized enabling the data scientist to configure the model rather than deal with designing it from the ground up. Furthermore, the libraries allow convenient methods of transferring and processing the data in the GPU.

As illuded to in previous articles, datasets are incredibly important, especially when they are labelled by humans as this presents another challenge, that is data bias. Humans making decisions in the present are really making decisions based on their previous experiences. This may sound controversial, but if we assume that we are a product of our experiences then this observation does make some sense. If two people witness an incident, then they usually recall it with slightly different detail.

Our brain is constantly being bombarded with millions of bits of information from our senses every minute of every day and it cannot hope to process it all simultaneously. Instead, we filter out much of the information and process only the data needed. And the information we filter out is based on our past experiences, which are different for everybody. Once again, we are running the risk of disappearing down a philosophical rabbit hole but just to reinforce this idea, watch the famous Simons and Chabris Selective Attention Tests on YouTube. You’ll understand my point when you’ve watched them.

Figure 2 – Fifteen samples of a dataset of TCP/IP flows, but could as easily be video or audio samples.

Figure 2 – Fifteen samples of a dataset of TCP/IP flows, but could as easily be video or audio samples.

Machine learning relies almost entirely on accurately labelled datasets, but if they are wrong, then the whole model is wrong, and we are presented with incorrect or even biased outcomes. In television, we have the opportunity to label many of the datasets by industry professionals. For example, somebody working in subjective QC will be able to label many hours of video as either pass or fail. But how do we know they were correct?

Key to overcoming data bias classification is to first of all be aware of the phenomenon. Any engineer or technologist learns early on in their career that they should question everything and validate their assumptions. The same is true in data classification. Furthermore, we can mitigate against bias by both increasing the size of our datasets and increasing the diversity of the number of humans that are classifying the data. The last thing we want is classified data to be classifying data as the bias amplifies and skews. Alas, there are numerous examples of this having already happened.

Another challenge we have is determining who owns the data. For example, facial recognition systems are well established, and a robotic camera connected to a suitable machine learning system could find specific people in a crowd and zoom in on them. One fantastic application of this is in sports where multiple robotic cameras could be used to frame shots of a specific player using facial recognition. But to do this the model would have to have been trained with thousands of instances of the images of the players in the respective league. The technology is well established to do this. However, who owns the image of the sports player? Is it the sports person? The photographer? The agency who employed them? Or even the governing sports league? It depends.

The point is that we cannot assume that we can use the dataset we have even if we want to. And this is another great challenge for broadcasters hoping to leverage machine learning. Not only do they need to be sure that the data does not suffer from bias, but they need to be sure the vendor has authorization to use the data. Anybody using a free social media service might want to read the very small print to see if they are transferring their image rights to the social media company.

Broadcast television has the opportunity to benefit greatly from machine learning and we are very much in the infancy of its development. But unlike broadcast technology of the past, we now must contend with the validity of datasets.

You might also like...

Minimizing OTT Churn Rates Through Viewer Engagement

A D2C streaming service requires an understanding of satisfaction with the service – the quality of it, the ease of use, the style of use – which requires the right technology and a focused information-gathering approach.

Production Control Room Tools At NAB 2024

As we approach the 2024 NAB Show we discuss the increasing demands placed on production control rooms and their crew, and the technologies coming to market in this key area of live broadcast production.

Designing IP Broadcast Systems: Where Broadcast Meets IT

Broadcast and IT engineers have historically approached their professions from two different places, but as technology is more reliable, they are moving closer.

Network Orchestration And Monitoring At NAB 2024

Sophisticated IP infrastructure requires software layers to facilitate network & infrastructure planning, orchestration, and monitoring and there will be plenty in this area to see at the 2024 NAB Show.

Audio At NAB 2024

The 2024 NAB Show will see the big names in audio production embrace and help to drive forward the next generation of software centric distributed production workflows and join the ‘cloud’ revolution. Exciting times for broadcast audio.