Building Software Defined Infrastructure: Ground To Cloud

New efficient and flexible workflows like remote production and multi-site teams mean using IP to transport media between sites, and this brings its own challenges to flexible infrastructure design.

Transitioning to IP opens a whole world of opportunity for broadcasters, especially when we look at leveraging COTS infrastructures. The theory of scaling IP broadcast systems is well documented, but the practice can be a little more challenging.

Having the option of scaling a broadcast infrastructure so that we no longer must design for peak demand is especially appealing when considering the potential of expanding to the public cloud. Although it may be somewhat over-egging the point to say that the public cloud is an infinite resource, it’s certainly true to say that there is more resource in the public cloud than the average broadcaster would ever need.

Transporting Media

There are no SDI, AES, or even GPIO connections to and from the cloud. All we have is an IP connection. The layer-2 data link is often beyond our control and all we can rely on is that the connection will support IP. Another unfortunate fact is that the resource we use, as well as the link to it, is often shared and contested, and this is something new for broadcasters.

One of the reasons that broadcasters have built systems the way they have with custom hardware is that the signal flows have guaranteed resource allocation. A video processor will only process the video that is being presented to it and a sound console will only need to process the audio going into it. This has guaranteed reliability with no video or audio distortion or breakup. The downside of these infrastructures is that they need to be designed to meet the expected peak demand of the facility, and this leads onto the massive and high cost of equipment procurement. A motivation for moving to IP is that we no longer need to design for peak demand but this, at the same time, introduces the concept of sharing resource.

The idea of using hardware resource, such as a server, that can change the functionality it is providing and adapt to different workflows, is a major win for broadcasters, and to truly leverage this potential we must embrace the concept of resource sharing. Data links between COTS resource is one area where we need to consider this further.

Data Links

By sharing a data link between multiple devices, we are able to take advantage of the economies of scale for that link. In traditional broadcast facilities, sending video and audio to and from different facilities would require communication links specific to broadcasting, and with this customization, the costs would escalate and the delivery time for new installations would often stretch into many months. Data links used to transport IP packets, on the other hand, use industry standard technology that telcos are familiar with and have infrastructures to support. This significantly reduces the cost and delivery times as there are many other suppliers in the market all fighting for the broadcaster’s business.

IP is the protocol that facilitates packet delivery for many different data types such as emails, banking transactions and control systems, as well as streaming video and audio. The underlying transport stream is separate to the IP protocol and can take on many different types including Ethernet and fiber. This is an important differentiation as the IP packets exist independently of the underlying data link.

Streaming Media

Within ST2110 type broadcast environments, the network employed is often of a very high quality. This is to reduce the possibility of congestion, which often results in packet loss, which in turn will lead to video and audio distortion. Packet loss not only occurs through congestion but can also be a consequence of environmental issues where electromagnetic interference or damaged connectors cause packets to be corrupted.

In the networks found in offices or industry, occasional IP packet loss is an accepted part of the network operation, and TCP (Transmission Control Protocol) is employed to overcome this. The TCP specification is nearly as old as IP itself and is used to resend lost or corrupted IP packets to guarantee data integrity at the receiver. However, a major drawback of TCP is that it introduces indeterminate and variable latency into the signal flow.

Figure 1 – In this TCP flow, if a packet is lost in transit, the receiver must send a request back to the sender and request the packet to be resent. In this instance, the sender is not aware the receiver hasn’t received segment 2 (packet 2), and the receiver was not aware packet 2 was not sent so it cannot send a resend message. Consequently, the sender must wait for the timeout to occur. This causes unpredictable latency.

Figure 1 – In this TCP flow, if a packet is lost in transit, the receiver must send a request back to the sender and request the packet to be resent. In this instance, the sender is not aware the receiver hasn’t received segment 2 (packet 2), and the receiver was not aware packet 2 was not sent so it cannot send a resend message. Consequently, the sender must wait for the timeout to occur. This causes unpredictable latency.

Buffers are used within real-time software applications to iron-out latency so that video motion is fluid and audio doesn’t suffer from distortion. However, buffers further increase latency, which is unacceptable for studios using ST2110, or similar. This method generally works for media file transfer or OTT broadcasting to the home where a bit of extra latency is not really an issue but is not acceptable for studios. Therefore, we should not be using TCP for high quality broadcast studios.

This isn’t a case of never use TCP for studios, but great care should be taken if it is used due to the extra latency it can introduce. Also, the variable and unpredictable nature of this latency should be a serious concern for broadcasters. And this is one of the reasons high-end network grade switches and fiber connectivity is used in the broadcast studio. The higher quality the equipment and the better the over provision of network bandwidth and overhead, then the less likely that IP packets will be lost.

Lossy Cloud Connectivity

Unless a broadcaster is going to go down the route of using CDNs to stream media to and from the cloud, then the data link from their facility must be considered lossy. The extent to the packet loss is determinate on the service level agreement the broadcaster has with their ISP, but this only gets them to the edge of the cloud. The broadcaster really has no idea what the cloud service provider is doing with the network inside of their domain.

There are proprietary solutions such as the AWS CDI (Cloud Digital Interface) which allows uncompressed video streaming into their infrastructure. But this has limitations if a broadcaster doesn’t want to be locked into one vendor. Another option is using one of the opensource streaming protocols such as RIST (Reliable Internet Stream Transport) or SRT (Secure Reliable Transport).

Both RIST and SRT use UDP (User Datagram Protocol) to connect the sender to the receiver and provide their own ACK (Packet Acknowledge) protocols to resend lost packets. UDP is a fire-and-forget protocol that extends the IP datagram to provide more granular addressing as well as some extra header information. The IP datagram acts as a wrapper for the UDP datagram and doesn’t provide any form of resend to take into consideration packet loss.

Both RIST and SRT operate by streaming UDP packets across a network and include some broadcast specific features such as programable latency and FEC support. Both these protocols have stood the test of time and have gained great popularity. However, care must be taken when using them due to the concept of congestion control.

Congestion Collapse

In the early days of the internet (1986) researchers discovered that the data throughput between two buildings had dropped to a disproportionally low rate, but the bit rate was at the full link capacity. What the researchers discovered was the first instance of congestion collapse where multiple senders were transmitting a packet of data simultaneously, which resulted in the link becoming congested, and then only a few of the packets got through to the receiver. The senders timed-out because they didn’t receive the acknowledge packet and resent the same packet. Each of the sending computers had synchronized their send rate so that the overall bit rate was at its maximum, but the data throughput was very small.

This congestion collapse highlighted a deficiency in the TCP protocol at the time and to fix it Van Jacobson introduced congestion control. Essentially, this is a method where a sender randomly holds back the retransmission of a lost or unacknowledged packet. This stops multiple senders from synchronizing their packet retransmits thus stopping the possibility of congestion collapse. Unfortunately, one of the side effects of congestion control is that it has the potential to introduce variable and indeterminate latency, something broadcasters don’t want.

UDP based systems such as RIST and SRT allow the congestion control to be adjusted so that latency can be significantly reduced. The engineer configuring the system must understand the ramifications of adjusting and tweaking the congestion control algorithm, especially when a network is highly utilized. Both RIST and SRT are very reliable but understanding how their congestion control algorithms operate is essential.

Engineers should also be aware of the various regulatory “Fair Use” policies that exist around the world to limit the possibility of congestion collapse. These policies may have implications for the ISP the broadcaster is using as they often mandate that it is the ISPs responsibility to make sure their clients are not doing anything that may cause congestion collapse. And the tools the ISPs have of enforcing this are traffic shaping and bandwidth limiting.

Streaming media from the ground to the cloud is not as easy as it may first appear as the engineers must assume that the data link has some packet loss. And regardless of the type of streaming method that is employed, when there is packet loss there will be latency – or distorted video and audio will ensue.

Part of a series supported by

You might also like...

Microphones: Part 10 - Mid-Side (M-S) Recording And Processing

M-S techniques provide useful sound-field positioning and a convenient way to check mono compatibility. We explain the hard science behind this often misunderstood technique.

Building Software Defined Infrastructure: Asynchronous & Synchronous Media Processing

One of the key challenges of building software defined infrastructure is moving to a fundamentally synchronous media like video to an asynchronous architecture.

Monitoring & Compliance In Broadcast: Monitoring Cloud Infrastructure

If we take cloud infrastructures to their extreme, that is, their physical locality is unknown to us, then monitoring them becomes a whole new ball game, especially as dispersed teams use them for production.

Phil Rhodes Image Capture NAB 2025 Show Floor Report

Our resident image capture expert Phil Rhodes offers up his own personal impressions of the technology he encountered walking the halls at the 2025 NAB Show.

Building Hybrid IP Systems

It is easy to assume that the industry is inevitably and rapidly making the move to all-IP infrastructures to leverage IP’s flexibility and scalability, but the reality is often a bit more complex.