Building Software Defined Infrastructure: Network Monitoring

IP networks are a fundamental building block for software defined infrastructure, so a solid technical understanding of network monitoring is essential when building dynamic microservice based systems.

We may be using COTS type infrastructures to distribute media over IP networks, however, broadcast television has some anomalies that we can’t just wish away. The main challenge is that uncompressed video streams data packets relentlessly, which differs greatly from the temporally short bursty traffic of enterprise type use cases, and this cannot be ignored. Consequently, we need to look beyond simple averaged datarate monitoring.

Circuit switched networks, as found in SDI and AES infrastructures provide guaranteed bandwidth and latency but do so at the expense of limited flexibility. If a broadcaster just wants to base their system on HD/4K formats, then SDI/AES infrastructures are perfectly adequate. However, when they want to move to add higher orders of metadata to provide additional services such as HDR, WCG and audio description, then IP is the most convenient way to go.

Packet switched networks, such as those found in ethernet, and IP infrastructures operate on the assumption that the data sender is bursting packets and their relationship with the receiver is transactional in nature. This facilitates a time-division multiplexed type system that relies on gaps naturally forming in the data link so that other packets can be inserted. The network design fundamentally relies on statistical peaks and troughs so that packets can be inserted into the stream.

Statistical Multiplexing

Although this idea may seem new to broadcasters as they have mainly operated with synchronous circuit switched networks through SDI and AES, we have come across this concept before in the guise of statistical multiplexing. Anybody of a certain age will probably remember IP/ASI and statistical multiplexing with MPEG codecs all trying to squeeze an extra few megabits out of the transport stream to facilitate more channels. This system relied on VBR (Variable Bit Rate) encoding that focused higher bit rates on dynamically moving pictures with high transients, thus providing lower bit rates for more static pictures that could then be exploited by the stat-mux. A feedback mechanism often existed between the stat-mux and codecs so that data bandwidth could be applied to the video streams that most needed it. 

The feedback mechanism doesn’t naturally occur in UDP transfers as seen with ST2110, but VBR exists in high end codecs thus creating more bursty data for the IP/ethernet switch. ST2110 produces evenly gapped packets that are distributed over an IP stream within very tight tolerances to keep the sender and receiver buffers small, thus reducing latency. In bursty networks, buffers iron out the short transients to form an orderly queue allowing the CPU to process them without loss. In effect, the buffers are acting like synchronizers between the IP link and the CPU processing in the server.

Enterprise networks do not naturally evenly gap packets and instead work on a greedy approach of filling the data link to capacity. This requires some clever network design as two types of data flows emerge, short bursty traffic employed by systems such as control, metadata and monitoring data, and evenly gapped packets found in protocols such as ST2110. It is possible to put all this data within the same data links, but great care must be taken as the short bursts of IP packets could easily shift the ST2110 packets outside of their tolerance, especially when the network starts to be congested.

The trail blazers of ST2110 IP networks tended to err on the side of caution and keep network capacity low to avoid congestion. But as network utilization advances in the broadcast arena, the networks must be stressed more to get every megabit of bandwidth capacity out of them. And this is where our challenges start to manifest themselves.

Although we try not to speak too much of demarcation within the broadcast and network engineering teams, there is still a fundamental difference between how they think and approach problem solving. 

Network engineers are mainly concerned with the timely delivery of data, but broadcast engineers must build on this and look at the pixel delivery. Television is still a synchronous system based on sampled time invariant video frames.  IP networks may well add a certain amount of variable timing to the transfer of packets, but fundamentally, the endpoints within the television system are synchronized, i.e. camera and monitor, or microphone DAC and television audio ADC.

Routing Elephant & Mice Flows

Fundamentally then we have two different types of IP packet flow, short bursty packets for control and metadata, and long, high data rate based flows to transfer the video and audio essence. In network terminology, these are referred to as mice and elephant flows. These two types of flows are known to interact to their respective detriment, for example, the long elephant flows may fill the egress buffer in the switch and hold back any short control and meta flows. Not only does this cause latency, but it can also force TCP flows to time-out and resend the packets, thus adding to the congestion due to the multiple resends.

Diagram 1 – In all three diagrams, the average datarate (measured over a second) is similar. a) provides the data distribution of an SDI flow with the diagram on the left showing even distribution of data, the bell curve on the right shows a very small standard deviation. b) shows an ST2110 IP packet distribution with moderately gapped IP packets and a wider bell curve on the right showing a higher standard deviation. But c) shows massive packet bunching with the associated very high distribution in the bell curve on the right.

Diagram 1 – In all three diagrams, the average datarate (measured over a second) is similar. a) provides the data distribution of an SDI flow with the diagram on the left showing even distribution of data, the bell curve on the right shows a very small standard deviation. b) shows an ST2110 IP packet distribution with moderately gapped IP packets and a wider bell curve on the right showing a higher standard deviation. But c) shows massive packet bunching with the associated very high distribution in the bell curve on the right.

This highlights another interesting difference between enterprise and broadcast IP networks. The majority of IP flows within an enterprise network are based on TCP, whereas broadcast IP flows employ a higher-than-average amount of UDP flows. The TCP flows tend to be used by control and metadata type applications, whereas the UDP flows are used for media delivery. It is possible to use TCP for video and audio, and some systems do, but the potential for the latency to both increase and become unpredictable increases significantly.

If we were to measure the datarate of an IP data link, then initially we start by looking at the average datarate of the link itself. This could be solved by just counting the number of ethernet packets (assuming ethernet is being used as the data link), multiplying by the MTU and dividing by the number of seconds of the measure period. This will provide a reasonably accurate datarate, but it tells us nothing of what is going on within the individual media, control and metadata flows.

Balancing Network Flows

This is important as a good network design will separate the mice and elephant flows, sending them on different routes, so that they do not interfere with each other and cause increased and indeterminate latency. To achieve this, we need to introduce a short-term measure and take into consideration the standard deviation.

The short-term measure will allow us to spot the bursty flows and learn their behavior. If a flow has a data rate of 1Mb/s then this tells us nothing of the distribution, the packets may be evenly gapped, or all bunched together in the first few milliseconds of the flow. But if we introduce the standard deviation, then we have better insight into the packet distribution.

SDI’s 1.485GHz clock, used to send the HD video information will have a standard deviation very close to the clock frequency. This is because SDI is synchronous and any change in the data distribution will be due to the drift in the clock frequency, which in many cases will be very low. An ST2110 video stream will have the average measure close to the packet send rate, and will have a standard deviation larger than SDI, but not hugely massive due to the constraints placed on ST2110. However, a control or metadata stream will have a standard deviation that is not only very high but also skewed. For example, in the case of a control message that may span over ten IP packets, if they are bunched together over a few hundred milliseconds, then the average datarate would be 120Kb/s, but the deviation could well be very high.

The principal question is “who cares”? Although mice and elephant flows are considered in enterprise networks, within a broadcast IP network, great care should be taken to not only separate them but also keep an eye on the standard deviation of each flow as if it is excessive, it could point to high network jitter. In the extreme, this could result in packet loss or even network congestion.

Understanding Packet Distribution

In a synchronous broadcast infrastructure using SDI and AES, buffer size isn’t much of an issue as the latencies are constrained to the propagation time of the distribution medium, such as the cable or fiber. However, when considering asynchronous IP networks, not only is packet bunching and network jitter a major issue, if it is not tamed then larger buffers will be needed which results in massively increased latency.

Measuring the average datarate in asynchronous networks only gives us a high-level overview of what is happening in the network. Too much jitter or packet bunching can result in the need to employ larger buffers to iron out the timing anomalies, which in turn can lead to excessive latency. Although the average datarate may be important, in asynchronous systems, other statistical measurements such as the standard deviation, skew, and kurtosis (size of the tails) provide an element of detail that helps us avoid dropped packets and keeps latency low.

Part of a series supported by

You might also like...

Broadcast Standards: The Principles, Terminology & Structure Of Cloud Compute Based Systems

Here we outline the principles, advantages, and various deployment models for cloud compute infrastructure, along with the taxonomy of cloud compute service providers and the relevant regulatory frameworks.

Live Sports Production: Broadcast Controllers & Orchestration In Live Sports Systems

As production infrastructure, processing resources and the underlying networks required become ever more complex, powerful tools are required to plan, deploy and monitor.

Monitoring & Compliance In Broadcast: Monitoring The Media Supply Chain

Why monitoring the multi-format delivery ecosystem starts with a holistic approach to the entire media supply chain.

IP Monitoring & Diagnostics With Command Line Tools: Part 3 - Monitoring Your Remote Systems

Monitoring what is happening in a remote system depends on being able to ask for something to be checked and having the results reported back to you. There are many ways to do this. This article looks at some simple…

Broadcast Standards – Cloud Compute Infrastructure – Part 1

Welcome to Part 1 of Broadcast Standards – Cloud Compute Infrastructure. This collection of articles is the first in a new series which expands on the enormously popular ‘Broadcast Standards - The Book’ by Cliff Wootton. Over the coming months a series of Th…