Reality of IP - Part 3 - Data Acceleration

Network Interface Cards (NIC’s) are often seen as the bottleneck of data processing for ST2110 and ST2022-6. IT manufacturers have witnessed similar challenges with high speed trading and 5G networks but have been able to provide real-time solutions to overcome latency and blocking. In this article, we investigate IT’s achievements and how they are applicable to broadcast television.

A progressive HD video source creates just over 2.5Gb/s of data resulting in approximately 200,000 IP packets every second. SMPTE’s ST2110 adds rate shaping to keep the variance of the packet rate within tight limits. The purpose of this is to keep latency low and avoid dropped packets.

ST2110 defines two parameters for rate shaping; Cmax and VRXfull. Cmax describes the variation of packets leaving the sender and VRXfull describes the variations of packets being read at the receiver.

In the ideal world, the rate at which a device writes to a buffer will exactly equal the read rate. However, in software systems, the short-term data rate being written to the buffer may spike briefly, even though the long-term average will be the same.

Packet Loss from Bursts

If a camera is sending packets with too much burst, then the short-term variance could be too high. Either a larger send-buffer will be needed in the camera or packets will be lost. A larger buffer will solve this issue but will result in increased latency as the packets are spending more time in memory.

Field Programmable Gate Arrays (FPGA’s) can send and receive packets with very low variations. But software systems can easily create large variances due to the interaction of the application software, operating systems, and shared resource.

ST2110 has three parameters to define the sender rate shaping; narrow (N), narrow linear (NL) and wide (W). Table 1 below shows the relative variance sizes compared to the data rate of the video.

Table 1 – size of relative packet variance for 1080p50 video stream.

To put this into context, the 1080p50 HD stream requires approximately 3.7 IP packets per line of video. In the narrow model in Table 1, the maximum variance is 4 IP packets, or just one video line. But the wide variance is 16 packets, approximately 4 video lines at the sender, and 720 packets, approximately 194 lines at the receiver.

As we migrate to IP, manufacturers are writing new FPGA firmware to deliver ST2110 streams resulting in well gapped packets with very little variance that will easily match the Narrow model.

Wide and Narrow Incompatibilities

There are potential challenges for graphics processing applications running on servers. Its highly likely that developers of a pure software application would specify the wide model to provide the maximum IP packet variance allowable under ST2110. However, if it was streaming to a connected device using an FPGA implementation then it may well specify the narrow method. A wide output and narrow input are incompatible and will result in lost packets.

IT manufacturers have been working to resolve similar latency issues and have achieved better latency and data throughput in recent years. It is now common to aggregate computing on virtualized, cloud native clusters of COTS servers. 5G Telco clouds moved to this architecture and new IP-based broadcast systems will adopt similar architectures.

Aggregating compute and networking in a private cloud architecture yields significant bandwidth using commonly available IT servers. Today, a typical dual processor server node may have 16 cores per processor or 32 cores per node. A typical private cloud deployment may range from 4 to 32 nodes, or 128 to 1024 cores.

Offload Repetitive Tasks

To further improve CPU performance, packet processing tasks are offloaded to hardware controllers. A compromise between flexibility, off-the-shelf availability, and cost is constantly being balanced.

Using Linux as an example, a NIC will receive an IP datagram and copy it to memory. When a predefined memory limit has been reached, the NIC generates an interrupt causing the kernel to terminate the current process and then service the NIC’s request to copy the IP packets from the NIC card to the server memory.

Even using the lightweight UDP protocol, kernel-based networking soon reaches the limits of its capabilities due to the speed of the processor’s core performance ceiling, typically 8Gbits/s. As bandwidth demands increase, so does the required CPU performance resulting in increased and unpredictable latency.

Kernel Bypass

Video processing is CPU intensive and we can’t afford for the CPU to be tied up moving data backwards and forwards between its memory and the NIC’s buffers. Kernel-bypass solves this issue by offloading input/output processing from the CPU to more intelligent NIC’s.

Bypassing the kernel and IP stack reduces context switching, that is the overhead time taken to service the NIC’s interrupt, and memory to buffer copies resulting in extremely low latency. Data received from the NIC is written directly into user-space facilitating faster data processing.

Diagram 1 – the left side shows data flow through the kernel to access the NIC, the right shows direct access to the NIC to reduce latency.

The Data Plane Development Kit (DPDK) is an open source project to deliver fast packet processing in networking applications using kernel-bypass technology. Essentially, DPDK is a programming framework that allows developers to load and configure libraries and drivers to meet the needs of their specific application and hardware.

The Environment Abstraction Layer (EAL) hides the hardware from the developer and provides a standard programming interface to libraries and available hardware accelerators such as performance network adaptors.

Polling Replaces Interrupts

To remove the overhead of interrupt processing, DPDK uses polling. Flags within the NIC will be read every few microseconds to determine if it has any data to transfer. To stop DPDK from blocking any other processes, a manager library is employed to implement lockless queues.

Using DPDK, Intel have reported the packet processing performance has been boosted by up to ten times on their Xeon E5 processor, achieving speeds of 233Gbits/s during IP forwarding.

Ethernet Just Keeps Getting Faster

Ethernet speeds continue to increase at an unprecedented rate. In just twenty years, Ethernet data rates increased ten thousand times from the humble 10BASE-T (IEEE-802.3i) 10Mbits/s in 1990, to 100Gbit/s (IEEE-802.3ba) in 2010. And in December 2017, IEEE ratified 802.3bs giving 200Gbit/s and 400Gbits/s.

IEEE-802.3bs specifies four distances to operate over; 100m, 500m, 2km (1.2 miles) and 10km (6.2 miles). Although 200GbE and 400GbE will initially be core network technology, as with other standards such as 25GbE and 100GbE, they will soon work their way closer to the edge of the network.

To put this into context, a 200GbE network connection can transport 66 simultaneous 1080P50 video services, or 132 over a 400GbE. Even an 8K service would be able to be distributed over 400GbE.

Better Economies of Scale

But the higher data rates deliver an unexpected bonus, they provide better economies of scale. 400GbE is not only four times as fast as 100GbE, but it allows a denser configuration. A 1U switch with 32 ports of 400GbE will have the same data throughput than the equivalent 100GbE switch with 128 ports, requiring at least 4U. Not only will this reduce the required rack space, but a single 1U unit will be cheaper to build than four 1U units delivering a similar capacity.

Technologies such as 5G are paving the way for broadcast television. With high speed, low latency networks, 5G operators have been able to prove bandwidths and latencies needed for broadcasters are achievable today. Adding the flexibility of COTS data center servers and network switches, the future is proving to be very optimistic for both television and media companies.

The time to market for traditional broadcast manufacturers has been slashed giving them untold opportunities to provide better services and business models. And new players will be able to hit the ground running, so they can share their new solutions quickly and efficiently.

Other related articles posted on The Broadcast Bridge.

Part of a series supported by

You might also like...

Building Software Defined Infrastructure: Observability In Microservice Architecture

Building dynamic microservices based infrastructure introduces the potential for variable latency which brings new monitoring challenges that require an understanding of observability.

IP Monitoring & Diagnostics With Command Line Tools: Part 5 - Using Shell Scripts

Shell scripts enable you to edit your diagnostic and monitoring commands into a script file so they can be repeated without needing to type them manually every time. Shell scripts also offer some unique and powerful features that help to…

Broadcast Standards: Kubernetes & The Architecture Of Cloud Compute Based Systems

Here we describe Kubernetes and the taxonomy of containerized architecture based cloud compute system designs it manages.

Live Sports Production: Backhaul In Live Sports Production

Getting content reliably and securely from venue to studio remains key to live sports production so here we discuss the technology and services required.

Monitoring & Compliance In Broadcast: Monitoring Delivery In The Converged OTA – OTT Ecosystem

Convergence or coexistence between linear broadcast, IP based delivery and 5G mobile networks creates new challenges for monitoring of delivery paths, both technically and logistically.