Understanding IP Production Networks: Part 14 - Delay Monitoring

We use buffers to reassemble asynchronous streams so we must measure how long individual packets take to reliably get to the receiver, and the maximum and minimum delay of all packets at the receiver.

Video and audio monitoring in baseband formats is well established for levels, noise, and distortions. Television monitors provide subjective visual checks, and objective measurements can be taken using waveform monitors. Audio is similar in that loudspeakers and headphones provide subjective checks, and PPM’s, VU’s and loudness meters provide objective verification.

IT subjective information consists of determining the user experience; how long does it take for a web page to respond to a mouse click? And how fast will a file transfer? IT networks use packet analysis tools such as Wireshark to look closely at the packets, and IPerf to find absolute maximum data rates of network links.

Video and Audio bring a new dimension to monitoring for the IT department.  Not only are we concerned with how to measure the video and audio, but we must also analyze the time it takes for an IP packet to arrive at a destination and the variance of all other packets in the stream. If they take too long, then the receiver will drop them from their decoding buffer and cause signal corruption.

High level audio and video monitoring will always be important. Evangelists have often proclaimed that in a digital world we don’t need audio level monitoring as the signals don’t suffer the same distortion and level problems as analog lines. Anybody working at the front end of a broadcast station will tell you the reality is somewhat different.

In the past, broadcast engineers have had the luxury of assuming the underlying network is robust and solid. An SDI distribution system will provide nanoseconds of delay at 3Gbps, and a twisted pair balanced audio system will have similar delays with virtually no dropout.

Dealing With Delay

IP networks are very different. They’re designed with the assumption that there will be packet loss and variable delay. As IP networks are resilient and self-healing, it’s possible and likely that IP packets streamed across a network will take different routes and some won’t get there at all. If a router fails then the resilience in a network will send subsequent IP packets via a different route, often longer than the original. If the first router recovers, then the IP packets could be sent over this shorter link, resulting in packets being received out of sequence.

Figure 1 - Buffers are used as a temporary store to re-sequence packets.

Figure 1 - Buffers are used as a temporary store to re-sequence packets.

Significant variation in transmission of packets occurs due to the queueing that takes place in switches and routers. In integrated IP networks transfer of all kinds of data is taking place, from accounts transactions to office files; video and audio is competing with these to get to their destination.

Receiver buffering is a straightforward way of dealing with delay and sequencing problems. A buffer is a temporary storage area of computer memory where packets are written out of sequence and in varying time. The receiver algorithm reads the packets and reads them out of the buffer in sequence and presents them to the decoding engine.

Buffers are a trade-off between delay and validity of data. The longer the buffer the more likely it is to receive packets that have taken a disproportionate time to travel. However, the read-out algorithm has a delay of the time of the latest packet. In effect, the bigger the buffer, the longer the delay.

Dropped packets are caused either through congestion in a switch or router, or interference on a network cable. Congestion occurs when too many packets arrive at the router’s inputs too quickly and the router cannot respond to them quickly enough, or the egress port becomes oversubscribed. Much processing goes on inside a router or switch; the more features the device provides, the more chance there is of packet loss.

This is one of the reasons IT engineers try and use layer 2 switchers (Ethernet) wherever possible. They use look up tables to decide how to send the frame based on the Ethernet packet header destination address. This is relatively simple and can be achieved in almost real-time using a bitwise comparison in an FPGA (Field Programmable Gate Array).

As a router needs to dig deeper into the Ethernet header or IP packet it requires more processing power and the potential for packet loss increases. This is one of the areas IT engineers tend to quickly gloss over, working on the assumption that congestion occurs infrequently, and when it does TCP and FTP type protocols will fix the problem as they will resend any lost packets.

Figure 2 -  Computer network interface cards introduce delay and jitter.

Figure 2 - Computer network interface cards introduce delay and jitter.

In broadcast television, we cannot afford to drop even one packet. ST2022-5 incorporates FEC (Forward Error Correction), but this isn’t really designed to take the place of TCP or FTP to fix large errors caused by congestion, and relying on it to do so could result in unpredictable results.

Measurements

Consequently, we are interested in two network measurements; how long individual packets take to reliably get to the receiver, and what is the maximum and minimum delay of all packets at the receiver. On the face of it this sounds like an easy measurement to make using analyzers such as Wireshark. However, PC protocol analyzers rely on receiving data from the NIC (Network Interface Card) and time taken for the operating system to move data from the NIC to the main processor.

NIC’s have built in buffers that are used to receive and transmit data to the Ethernet cable or fiber. For transmission, they provide a temporary store should a collision be detected on the Ethernet link, and the packet needs to be transmitted, and for receiving they hold packets until the processor has time to copy them to main memory and process them.

The buffers and operating system incur further delay into the system and make critical measurement very difficult. Consequently we cannot be sure whether we are measuring the time taken through the network, or the time taken to process by the measuring systems OS and NIC. This is one of the occasions where a hardware solution gives consistently better results than software tools.

Queueing Theory

Queueing theory provides the foundation of router and switch operation in packet switched networks. It seeks to understand and explain how packets accumulate, wait and are serviced by forwarding devices under varying load conditions.

A network queue is a buffer where packets reside when the instantaneous arrival rate (datarate) exceeds the egress rate of the link the packet is being forwarded to. Using arrival distributions, service disciplines, buffer constraints and scheduling algorithms, queueing algorithms seek to predict latency, jitter, loss and overall system stability.

The fundamental operation states that if the average arrival rate approaches the output read rate then the queue length will grow to the point where the network becomes unstable. Traffic in high-speed networks, datacenters and the internet is often heavy tailed. This means that the statistical temporal distribution of packets decays much more slowly than the standard distribution, resulting in large bursts, long idle periods, and extreme variability that do not average out over time. This produces much longer queues than expected, resulting in buffer overflow occurring quicker, leading to wide design margins which must account for these correlated bursts. It’s no wonder SMPTE chose even gapping for its ST2110, which is achievable as the datarate of uncompressed video is relatively constant.

A FIFO (First In First Out) buffer management is by far the simplest method of designing buffers in routers and switches. Advanced traffic management algorithms require packets to be removed from the buffer and out of sequence to facilitate classes and achieve better levels of QoS.

Queueing theory also helps us understand how buffers absorb bursts and that excessive buffering creates buffer-bloat, where increasing queueing delay pushes the packets far beyond the tolerance of the end application. This leads to the relationship of the bandwidth-delay product, where high throughput flow requires sufficient buffering to accommodate congestion windows.

Not only does buffer overflow result in packet loss, but algorithms are implemented that forces packets to be dropped to reduce the probability of congestion on the link occurring. Strategies to drop packets include tail drop, random early detection or weighted random early detection, depending on the class configuration. Vendors tend to implement deep buffers for wide area network (WAN) interfaces and shallow buffers flow low-latency datacenter fabrics. Queueing theory helps determine these buffer sizes by analyzing the behavior of the expected workloads.

Network queueing theory uses mathematical modelling, control theory and practical scheduling mechanisms to underpin all QoS systems, and it remains essential for designing stable, predictable and high-performance IP networks.

You might also like...

Understanding IP Production Networks: Part 13 - Quality Of Service

How QoS introduces a degree of control over packet prioritization to improve streaming over asynchronous networks.

Understanding IP Production Networks: Part 12 - Measuring Line Speeds

Broadcast and IT engineers take very different approaches to network speed and capacity; it is essential to reach a shared understanding.

Understanding IP Production Networks: Part 11 - Network Analyzers

Wireshark is an invaluable tool that enables engineers to examine network traffic in detail. Commercial monitoring platforms provide even deeper observation.

Understanding IP Production Networks: Part 10 - Security

The flexibility of IP and COTS brings with it all of the security dangers of the internet and the need for robust processes. It means new questions need to be asked of broadcast equipment manufacturers.

Understanding IP Production Networks: Part 9 - Ethernet

How Ethernet has evolved to combat congestion and how speeds have increased through the decades.