Machine Learning (ML) For Broadcasters: Part 5 - Datasets And GPUs

In the final article in this series, we look at datasets, their importance and why GPUs are critical for machine learning.
Other articles in this series:
In the previous article in this series, we learned that forward propagation is the process of machine learning that facilitates prediction and classification. In mathematical and computational terms this is relatively straight forward process, albeit highly recursive and resource hungry. However, the learning process requires backwards propagation that uses complex mathematical functions to detect the global minima of the model (or function). It is this process that is computationally difficult and benefits greatly from GPU acceleration.
In machine learning, we do not use the GPU to render images, but instead use the hardware accelerated mathematical functions and high-speed memory within it to provide forward and backward propagation. Critically, these GPU processes rely on dividing an array of data into smaller sub arrays to match the GPUs memory map, and then provide processing threads for each sub array. In effect, one processing unit is associated with each sub array allowing thousands of computations to take place simultaneously.
In an image, an array of 1920 x 1080 may be split into 8 x 8 arrays to give 240 x 135 sub arrays. Each one of these would have a processing unit associated with it allowing 32,400 simultaneous parallel processes. If we substitute the pixels in an image for neurons in a neural network, then thousands of neurons can be processed in parallel with their associated data.

Figure 1 – GPUs are used to accelerate machine learning as the thousands of CPUs with associated memory allows for massive parallel processing.
The GPU functionality is abstracted away from the hardware using libraries such as NVIDIA’s CUDA. NVIDIA provide both the hardware and software, so they are able to highly tune the two leading to massive parallel processing efficiency. The CUDA library is a generic solution that facilitates all kinds of parallel processing from high performance computing found in finance, to image processing found in medical and broadcast.
A further software abstraction takes place using machine learning libraries to provide the necessary models. Pytorch and Keras are two such libraries and deliver convenient interfaces to many of the models needed for machine learning.
A data scientist working to build machine learning solutions spends most of their time preparing their dataset to meet the needs of the Pytorch and Keras models. This allows the models such as LSTMs or CNNs to be standardized enabling the data scientist to configure the model rather than deal with designing it from the ground up. Furthermore, the libraries allow convenient methods of transferring and processing the data in the GPU.
As illuded to in previous articles, datasets are incredibly important, especially when they are labelled by humans as this presents another challenge, that is data bias. Humans making decisions in the present are really making decisions based on their previous experiences. This may sound controversial, but if we assume that we are a product of our experiences then this observation does make some sense. If two people witness an incident, then they usually recall it with slightly different detail.
Our brain is constantly being bombarded with millions of bits of information from our senses every minute of every day and it cannot hope to process it all simultaneously. Instead, we filter out much of the information and process only the data needed. And the information we filter out is based on our past experiences, which are different for everybody. Once again, we are running the risk of disappearing down a philosophical rabbit hole but just to reinforce this idea, watch the famous Simons and Chabris Selective Attention Tests on YouTube. You’ll understand my point when you’ve watched them.

Figure 2 – Fifteen samples of a dataset of TCP/IP flows, but could as easily be video or audio samples.
Machine learning relies almost entirely on accurately labelled datasets, but if they are wrong, then the whole model is wrong, and we are presented with incorrect or even biased outcomes. In television, we have the opportunity to label many of the datasets by industry professionals. For example, somebody working in subjective QC will be able to label many hours of video as either pass or fail. But how do we know they were correct?
Key to overcoming data bias classification is to first of all be aware of the phenomenon. Any engineer or technologist learns early on in their career that they should question everything and validate their assumptions. The same is true in data classification. Furthermore, we can mitigate against bias by both increasing the size of our datasets and increasing the diversity of the number of humans that are classifying the data. The last thing we want is classified data to be classifying data as the bias amplifies and skews. Alas, there are numerous examples of this having already happened.
Another challenge we have is determining who owns the data. For example, facial recognition systems are well established, and a robotic camera connected to a suitable machine learning system could find specific people in a crowd and zoom in on them. One fantastic application of this is in sports where multiple robotic cameras could be used to frame shots of a specific player using facial recognition. But to do this the model would have to have been trained with thousands of instances of the images of the players in the respective league. The technology is well established to do this. However, who owns the image of the sports player? Is it the sports person? The photographer? The agency who employed them? Or even the governing sports league? It depends.
The point is that we cannot assume that we can use the dataset we have even if we want to. And this is another great challenge for broadcasters hoping to leverage machine learning. Not only do they need to be sure that the data does not suffer from bias, but they need to be sure the vendor has authorization to use the data. Anybody using a free social media service might want to read the very small print to see if they are transferring their image rights to the social media company.
Broadcast television has the opportunity to benefit greatly from machine learning and we are very much in the infancy of its development. But unlike broadcast technology of the past, we now must contend with the validity of datasets.
You might also like...
Building Software Defined Infrastructure: Part 2 - Processing & Streaming Media Essence
Welcome to Part 2 of Building Software Defined Infrastructure - a new multi-part content collection from Tony Orme. This series is for broadcast engineering & IT teams seeking to deepen their technical understanding of the microservices based IT technologies that are…
IP Security For Broadcasters: Part 12 - Zero Trust
As users working from home are no longer limited to their working environment by the concept of a physical location, and infrastructures are moving more and more to the cloud-hybrid approach, the outdated concept of perimeter security is moving aside…
Disruptive Future Technologies For HDR & WCG
Consumer demands and innovations in display technology might change things for the future but it is standardization which perhaps holds the most potential for benefit to broadcasters.
Essential Guide: Building Hybrid IP Systems
This Essential Guide brings together insight from four seasoned professionals who design, build and configure broadcast infrastructure at Systems Integrators in the USA and Europe. Our contributors here are from Aret, Broadcast Solutions and CP Communications and they are all…
IP Security For Broadcasters: Part 11 - EBU R143 Security Recommendations
EBU R143 formalizes security practices for both broadcasters and vendors. This comprehensive list should be at the forefront of every broadcaster’s and vendor’s thoughts when designing and implementing IP media facilities.