Building Software Defined Infrastructure: Shifting Data
The fundamental principles of how data flows through local and remote processing systems are central to designing software defined infrastructure.
All 16 articles in this series are now available in our free 88 page eBook ‘Building Software Defined Infrastructure’ – download it HERE.
All articles are also available individually:
To fully appreciate the complexity of moving large amounts of data in software defined infrastructures, we need to look more closely at the underlying hardware and how we overcome the challenges it presents.
The first point to note is that traditional SDI/AES infrastructures are primarily designed to move large amounts of data with the smallest delay possible whilst maintaining the highest data integrity. Synchronous distribution removes the need for packet headers so that most of the available data on the datalink can be dedicated to delivering the user data. In the context of television this is video and audio. Maintaining data integrity means that as much data as possible must delivered to the receiver free from loss and distortion.
IP networks and their associated routers, switchers and other infrastructure equipment such as servers and file storage, have a relatively high amount of latency when compared to traditional broadcast equipment. This is a direct consequence of the asynchronous nature of IT equipment. With a few exceptions, such as high-risk applications found in aircraft and medical systems, virtually all IT-COTS infrastructures rely on asynchronous data exchange and processing. This is by design, to keep systems as simple and flexible as possible.
Asynchronous By Design
Most IT type infrastructures used in web applications operate in a transactional manner. For example, a web browser requests a web page, or a string of text is sent to the server to request a response. These data request-reply messages are transactional, and therefore asynchronous by design. Hence the reason that synchronous data exchange is something that is rarely found in IT-COTS type applications as the latency defines the user response time, and if this is within a few hundred milliseconds then the person using the website doesn’t care too much. The same cannot be said for video and audio, especially in the studio and playout environment where consistent and predictable low latency is critical.
Broadcasters often migrate to IT-COTS infrastructures to take advantage of the inherent resilience, reliability and flexibility that they deliver. The downside of this change is that we must make our synchronous video and audio media streams operate on an underlying asynchronous network and server infrastructure with a minimal and predictable latency.
CPU Architecture
A further challenge occurs when we dig deep into the server architecture to understand data processing. The fundamental components of a computer server are the CPU and memory. The Von Neuman architecture has stood the test of time and is still the prevalent design for IT-COTS processing systems. In essence, code is loaded into the memory and then the CPU fetches the instructions from the memory and processes them sequentially. The CPU is a hardware instruction processor that relies on a system of registers and a program counter to provide conditional logic that makes the computer programmable.
As well as the code instructions residing in the memory, the data the CPU is processing also needs to be in the responsibility of the CPU to load the data from these devices into its local memory for processing.
The act of moving the data in traditional server architectures places a huge burden on the CPU, which in turn causes potentially massive latency, much of which is unpredictable. In a typical signal flow, the video and audio media come into the server via the ethernet NIC, and from there the CPU copies the streams into the CPUs local memory for processing. When processing is complete, the video and audio media is either copied back to the ethernet NIC for transfer to the next device or is copied locally to its hard disc drive. Due to the huge amounts of data involved in streaming video and audio media, the latencies in this sort of workflow quickly compound and make the whole server architecture virtually unusable.
Kernel Bypass
Building on the success of other industries, broadcasters can take advantage of systems such as kernel bypass. This is a form of direct memory access (DMA) where the copying of the data from devices such as the ethernet NIC employs a hardware accelerator to transfer it directly from the ethernet NICs memory into the CPUs local system memory, thus negating the need for the CPU to copy the data to and from the system memory.
Figure 1 – The image on the left shows a traditional transfer relying on CPU and operating system data copying resulting in excessive latency. The image on the right shows the kernel bypass approach using RDMA which requires very little CPU overhead resulting in high bandwidth signal transfer with very small latency
Employing such a strategy speeds up the transfer by many orders of magnitude as the data transfer becomes a dedicated hardware task that only briefly includes the CPU. The CPU, instead of copying data from one device to another, which is highly wasteful of resource, sets up a series of registers so that the DMA hardware system knows where to copy the data from, and where to send it to. When the transfer is complete, the DMA engine sets a flag in one of its control registers that lets the CPU know the transfer is complete allowing it to process the data. This method of kernel bypass using the processors DMA subsystem has effectively synchronized the data transfer with the CPU to keep latency to a minimum within an asynchronous environment.
A modern COTS server employs PCIe buses as a method of transferring high speed data from one device to another within the server. DMAs are employed within the PCIe subsystem that transfer data to and from many different devices so that the CPU doesn’t have to do this. These devices not only include ethernet NICs and disk drives but can also include GPU graphics cards and math coprocessor cards. The PCIe controller working alongside the DMA controller makes sure that there are no data clashes on the PCIe busses so that data integrity is maintained, and data throughput is as high as possible, hence keeping latency low.
Extending DMA To Networks
Although the DMA mechanism resides locally within a server architecture, it can be expanded to a much greater domain through the operation of the RDMA (Remote Direct Memory Access). The RDMA effectively expands the concept of DMA to exchange data between physically separate devices via the IP network.
RDMA facilitates the transfer of data from one device to another via the IP network such that the data is sent from the senders’ memory directly to the receivers’ memory via the RDMA protocol. In this context, when we speak of devices, we mean other servers or microservice software defined processes.
In traditional IT-COTS systems, this type of transfer would be CPU resource intensive as the data would have to be physically copied from the sender’s memory to the ethernet NIC, then from the receivers ethernet NIC to the system memory for processing. The burden on the sender and receivers CPU would be extensive to the point where the overall processing would be greatly delayed to the point where the latency would be at best unpredictable, and at worst incredibly excessive.
The RDMA protocol is effectively abstracted from the general operation through the concept of APIs. The API software interfaces form a method of allowing the controlling software to set up the source and destination end points for the data. If we extend the concept of “the data” to a signal flow, then it can be seen that the RDMA forms the basis of a signal flow from the source and destination, whether this is occurring locally within one physical server, or across a network to multiple servers.
RDMA For Signal Flow
If we extrapolate the concept of data transfer to that of signal flow, then it doesn’t take much of an intellectual leap to think of RDMA in terms of signal flow. Each device, whether it is a physical server, virtual machine, or microservice, can be thought of as a method of data exchange. By employing RDMA, the servers CPU no longer has to be associated directly with the transfer of data and can instead focus on processing the video and audio media streams directly.
The signal flow through RDMA requires the controller to establish the source and destination end points via an API call which will facilitate the video and audio media transfer. Upon completion, the destination device, virtual machine, or microservice will then be able to process the signal as if it had arrived as a synchronous video or audio signal.
There are many other variables that need to be considered when transferring large amounts of data, such as data link latency, bottlenecks, and packet loss, but employing strategies such as RDMA greatly improves video and audio signal flow through microservice, and software defined architectures.
Part of a series supported by
You might also like...
Broadcast Standards – The Science Of AI
Artificial Intelligence is already an integral part of our everyday lives and it is already making our lives more productive. But it is far from risk-free.
Standards: Audio - Standards For Audio Coding
Audio coding demands very different tools and workflows to video, but the same fundamental principles around quality apply to both. This guide surveys the standards, codecs and container formats you need to navigate modern audio workflows.
Broadcast Standards 2026 – Audio Coding
Audio is central to the whole broadcast experience. While video can show us what’s going on, it is audio that tells us how to feel about it. If only it wasn’t all so complicated.
Network Traffic Engineering: Why MPEG-TS Is Still The Standard
MPEG transport stream (MPEG TS) was designed in the 1990s to deliver continuous video and audio over unreliable, one-way networks, such as satellite, terrestrial RF, and cable, where packet loss and corruption are expected. But it is still prevalent in…
Standards: Video - High Efficiency Video Coding (HEVC)
Designed to halve the bitrate of AVC while supporting resolutions up to 16K, HEVC represents a significant leap in video coding efficiency. This guide explores its profiles, tiers and levels, and examines whether it can overcome the challenges of entrenched…