Hardware Infrastructure Global Viewpoint – August 2021

Value Of Your Data

As more broadcasters move to Cloud and IP infrastructures, the power of automation is becoming clear. However, to really gain the benefits that flexibility promises to deliver, we need to look more closely at the wealth of data already available to streamline workflows.

The beauty of the pay-per-use infrastructures offered by cloud computing is that we can scale systems to deliver resource to meet peak demand when needed and scale it down again when the demand subsides, thus keeping operational costs directly proportional to program output.

Scaling resource is a well-documented process and many of the public cloud vendors have available learning resource and examples of how to achieve this. But the big question is how do we know when to scale up and scale down?

As I see it, there are two solutions to this challenge: reactive and proactive prediction.

In the reactive version, we could place a measure on the job queueing system so when the number of jobs in the queue reach a threshold value then more resource is created. And when the queue drops to a level below a threshold the extra resource is closed, obviously with some hysteresis and dampening so as not to cause resource thrashing.

The proactive version would include at the very least some form of statistical analysis, or even machine learning. Looking at various trends in the data would allow us to predict how much resource we would need based on the historical patterns.

Key to both these, and more so with the machine learning solution, is the provision of data. The more diverse and plentiful the data we’re able to obtain, the more accurate will be our predictions and hence the speed with which the system can scale to our programming needs.

The great news is that broadcasters can generate much of this data for themselves specific to their operation. When building cloud infrastructures using the agile mindsets, monitoring and log generation form major components of the design. Consequently, software engineers often build detailed real-time monitoring systems into their software to provide accurate information on the operation of not only the infrastructure but also the individual software components. This is especially evident when working with microservices.

Collating and processing this colossal abundance of data is not trivial. Although machine learning requires a massive amount of data for its training, validation, and testing, it also requires the data to be presented to the models in a well-defined format. Anybody who has worked with developing machine learning based solutions will tell you they spend most of their time pre-processing the data to transfer it into a format the machine learning models can work with. That said, once the data has been understood and a format schema agreed, scaling control systems can automatically import the logs and pre-process them to provide the data needed without human intervention.

Streamlining workflows so they can meet peak demand through resource scaling should be an automated process and the metadata to do this is probably already available in the multitude of logs that are being created by the new breed of software designs. All we need to do is just gather and format the data in the form of metadata to provide the basis of a truly automated and valuable process.