Scalable Dynamic Software For Broadcasters: Part 10 - Monitoring Usage And Improving Efficiency

Operating a broadcast facility using microservices and containers may well deliver exceptional flexibility, scalability, and resilience. However, the hardware the microservices architecture it is running on will always have its limits, whether in terms of physical resource or cost. Monitoring not only improves our understanding of the limits but helps us build more efficient infrastructures to make the best of the available resource and budgets.

All 12 articles in this series are now available in our free 88 page eBook ‘Scalable Dynamic Software For Broadcasters’ – download it HERE.

All articles are also available individually:

On-prem datacenters are much more flexible than the traditional broadcast workflows but are still limited by the amount of physical hardware available. The flexibility we speak of is brought about by the assumption that the datacenter can be built to accommodate the average workflows and then scale to public cloud providers to meet peak demand, or dynamically repurpose existing resource in their datacenter.

Therefore, there are two challenges system administrators must address when scaling workflows: when to scale, and by how much? Furthermore, we also need to know when a system is misbehaving or a fault maybe developing. And both of these can be solved using intelligent monitoring.

Monitoring makes order out of apparent chaos. Whether measuring the voltage of a camera sensor or the loudness of an audio feed, the function of monitoring allows us to take a deeper look into the system to make sense of how it is operating. And this is particularly important for dynamic and highly scalable systems.

DevOps describes both a system of working and the people who carry out the functions. It’s like a bridge that joins the technology, architecture, and business operations all under one umbrella. DevOps encourages personal responsibility so that individuals can react quickly using agile methodologies while at the same time encouraging team collaboration to build and manage dynamic systems, especially for cloud, virtualized, and microservice architectures.

Deep monitoring helps DevOps understand how a system is performing so that they can both maintain reliability and understand which areas can be automated. Allowing virtualized, cloud, and microservice architectures to automatically scale up and down is key to building infrastructures that increase and decrease to meet the needs of the business. Consequently, monitoring must be built into the infrastructure from the ground up and not as an afterthought when the particular function has been designed and implemented.

Datacenter system administrators are used to monitoring metrics such as server uptime and storage capacity. A whole host of opensource tools such as Prometheus and Nagios provide a good insight to show how systems are performing in terms of CPU allocation, memory usage and available storage. But to allow microservice architectures to make much more efficient use of the underlying hardware and available budget, we must go several levels deeper in terms of monitoring.

Fig 1 - Monitoring software can be run as a pod on a node within the microservice architecture. A large broadcast infrastructure may contain many monitoring agents distributed all over the world.

Maintaining Resource

One of the advantages of microservices is that we can enable the function on a need-to-use basis. Although this is achievable with virtualized servers, where virtual machine instances are spun up as required, the spin-up latency can run into several minutes, whereas the spin-up latency of the microservice is often less than a few seconds. Therefore, virtualized servers are often left running in the background unused, which in a highly optimized system is wasteful of resource.

Physical and virtualized servers form the nodes of the microservice architecture where the high-level monitoring takes place. As the node can run on a virtualized server, the configuration of the virtualization may form another level of resilience. For example, if the physical servers were clustered then several machines could be clustered and each of the clusters could be allocated to a node. This level of abstraction provides much greater resilience, however, the virtualization will need to be monitored to confirm the nodes are working within the limits of the available resource and that all servers are functioning correctly.

Although the node isn’t exactly a server, the mapping of a server to a node helps understand the resource allocation. An IP port, server memory, CPUs and even GPUs are assigned to the node which in itself must be monitored. The orchestration system will provide this but at this level the DevOps may start building their own monitoring systems to check the allocation. If a node starts running short of resource, such as memory, the DevOps team will need to be alerted so they can take the necessary action. The preference is for the orchestration and management software to perform this automatically and then inform the DevOps team more nodes have been allocated.

Containers are subcomponents of the nodes and have resource allocated to them from the node. Multiple containers will see the node resource shared between them, and this allocation will also need to be monitored. If a proc-amp microservice and container is running on the same node as a color corrector container and microservice, the resource allocation doesn’t necessarily need to be divided equally. It might be that the proc-amp needs more memory than the color corrector.

The monitoring needed at this level will certainly require DevOps input and when the microservice was designed, the API would have made a significant amount of monitoring data available.

Monitoring the whole resource allocation from the servers to the nodes and then down into the containers and microservices will provide the necessary insight to not only maintain reliability, but also scale to meet the microservices resource needs.

Message and Queue Monitoring

Messages and queues not only provide valuable data exchange between microservices and the orchestration and management systems, but also act as an indicator to assist scaling.

The number of jobs users create is often proportional to the amount of resource a microservice system will need. And this demand doesn’t necessarily follow a fixed and deterministic pattern resulting in timed scheduling of resource becoming difficult. Measuring the number of jobs in a queue will provide the first indication of the total resource allocation.

Process commands within the microservice architecture are queued in buffers. This stops any important commands being lost if network congestion occurs or a server develops a fault. There may be tens, and potentially hundreds of message queues around the microservice architecture, all providing insight into how the system is performing.

One of the challenges DevOps teams have with monitoring is not only knowing what to monitor, but also knowing what not to monitor. Retrieving potentially millions of metrics from datacenters all over the world, and saving them to logs and traces, is a whole job in itself. The microservices themselves may well have logs and these need to be stored. But a broadcaster cannot store every metric within the monitoring system as they run the risk of impacting the way the system is operating due to the excessive workload on the servers and traffic on the networks.

Metrics, logs, and traces need to be stored so that the broadcaster can conduct some form of forensic analysis. This is not only to find out what went wrong should a failure arise, but also to provide evidence should there be any litigation from third party media owners, especially when considering security and cyber theft.

It’s worth remembering that monitoring isn’t new to broadcasting as waveform monitors, vectorscopes, and audio meters are the bread and butter of every broadcast infrastructure. However, what is new is the need to provide higher levels of resilience and scalability within the microservice architecture, especially when reliability and forensic audits are considered. And this is built into the microservice apps and management systems from day one.

Every broadcast facility has its limits, even when hybrid on-prem and public cloud infrastructures are adopted. In such a case the limits might not be physical infrastructure as the cloud will meet the needs of any peak demand, but there will be limits in terms of budgets and how much can be spent on scaling. As well as keeping systems reliable, monitoring also helps broadcasters maintain the efficiency of their microservice infrastructures.

Other related articles posted on The Broadcast Bridge.

Scalable Dynamic Software For Broadcasters: Part 11 - Container Security

Part of a series supported by

You might also like...

IP Monitoring & Diagnostics With Command Line Tools: Part 7 - Remote Agents

How to run diagnostic processes in each machine and call them remotely from a centralised system that can marshal the results from many other networked systems. Remote agents act on behalf of that central system and pass results back to…

Growing Momentum For 5G In Remote Production

A combination of factors that includes new 3GPP 5G standards & optimizations that have reduced latencies & jitter, new network slicing capabilities and the availability of new LEO satellite services are bringing increasing momentum to the use of 5G for…

Building Software Defined Infrastructure: Part 4 - Integration

Welcome to Part 4 of Building Software Defined Infrastructure. This multi-part content series from Tony Orme explores the microservices based IT technologies that are driving the next phase of transition from hardware to software based broadcast systems. This series is essential…

Monitoring & Compliance In Broadcast: Accessibility & The Impact Of AI

The proliferation of delivery devices and formats increases the challenges presented by accessibility compliance, but it is an area of rapid AI powered innovation.

IP Monitoring & Diagnostics With Command Line Tools: Part 6 - Advanced Command Line Tools

We continue our series with some small code examples that will make your monitoring and diagnostic scripts more robust and reliable