Designing IP Broadcast Systems: Integrating Cloud Infrastructure

Connecting on-prem broadcast infrastructures to the public cloud leads to a hybrid system which requires reliable secure high value media exchange and delivery.

For as long as broadcasters have been making television, they’ve been balancing the needs of flexibility, reliability, and resilience. Cloud computing has the potential to offer all three of these, especially when on- and off-prem are combined into hybrid systems.

Scalability has the potential to deliver the promised land when transitioning to IP as we no longer need to worry about reaching the limits of the available resource due to the abundance of COTS equipment. But to achieve this, systems must be moved from relying on hardware processing and distribution, to software processing and distribution. Servers, even in the public cloud, physically exist and need to be planned like any other resource. The big difference between the cloud and traditional broadcast facilities is that there is significantly more resource available in the public cloud to the point where it’s unlikely that we would ever exhaust it.

It is possible to build broadcast infrastructures to respond to peak demand, after all, this is exactly what we’ve been doing for the past ninety years. However, the cost of doing so significantly outweighs the benefits received, especially in todays ever increasing competition for viewers. Fundamentally, for a system to be scalable in the COTS sense, it must consist of software systems that provide the signal processing and control and remove the reliance on specific and custom hardware solutions.

It’s important to define what we mean by scalability as it means different things depending on whether we’re discussing on- or off-prem. Scaling on-prem requires a large COTS resource consisting of servers, storage, and network switchers such that the design can be amortized over several functions. With this model, the broadcaster has the flexibility of moving software functionality between servers, such as production switchers and sound consoles, but in doing so, they must be certain of where the upper limit of the hardware resource is. In other words, the on-prem infrastructure must be designed to meet peak demand.

It is possible to bring in extra servers, storage, and network equipment with adequate planning, but this takes time and defeats the object of scalability. Although this might sound like the broadcaster has taken one step forward and three back, it’s not as bad as it seems. Software defined workflows provide much more flexibility, even when working within the limits of a fixed hardware resource. The software components can be moved around the servers to optimize the efficient use of the hardware, and this is a significant step forward for any broadcaster.

Off-prem infrastructures imply that all the hardware infrastructure and hence software processing, signal routing and control is situated within a public cloud or a scalable private datacenter. We might assume that we don’t need to worry about the resource availability of the public cloud due to the massive datacenters that have been designed and built. Although there are some weaknesses in this assumption, it is fair to say that public cloud service providers have much more resource than broadcasters would ever need. This highly scalable resource frees broadcasters to remove the limited and constrained thinking associated with peak-demand infrastructures and instead imagine what an infrastructure would look like that isn’t restricted by hardware resource.

There are some challenges that are not so obvious or trivial when using public cloud providers. Firstly, moving synchronous high-bandwidth and low latency video and audio to and from cloud providers isn’t as easy as it may first seem. The only connection into the cloud is an IP link, which may be ethernet or some other datalink layer, and this means the amount of control a broadcaster has over the connection is potentially compromised. Secondly, public cloud servers do not become instantly available when the user demands them as a reasonable time requirement for resource allocation and configuration must be accounted for.

There are no SDI connectors or GPIOs in the cloud. Yes, GPIOs have now found their way into the NMOS specification so they exist as software protocols, and video and audio can be sent over IP.  But the only method of connectivity for moving signals to and from the cloud is over an IP link. There are solutions as video compression allows the bandwidth to be reduced and protocols such as RIST and SRT will correct out-of-sequence packets while keeping latency low. However, it’s worth remembering that IP and the wider internet was never designed to deliver synchronous, low latency media signals, therefore, the connectivity must be thought about in greater detail.

In essence, all COTS systems are asynchronous in their operation. This applies to the whole infrastructure equipment including servers, storage, and network devices. Consequently, transferring video and audio signals across IP links needs deeper thought about the IP protocols used. If UDP is employed then latency is kept low, but the impact of packet dropout is increased, and if TCP is used then the impact of packet dropout is negated but latency becomes random and indeterminate. A compromise is to use one of the ARQ protocols such as SRT and RIST as they combine the packet reliability of TCP with the low latency of UDP, but with these, rate shaping should be employed if congestion control is minimized or switched off, otherwise the “greediness” of the unconstrained UDP streams could lead to network congestion.

On-prem datacenters may appear to be limited due to the restraints of the capacity of the available hardware, if they are designed with agile methodologies, then they could benefit from extremely large scalability. This is achieved by using cloud infrastructures as a bolt-on to the existing on-prem datacenter. By using technologies such as microservices, additional functionality can be spun up in the cloud to meet the peak demands. For example, if a new transcoder was required and the on-prem datacenter was running out of hardware, then a new transcoder running in a container in the cloud could be enabled and attached to the on-prem microservice architecture, thus adding the extra resource without the limitations of the on-prem hardware.

Figure 1 – Vertical scaling will soon run out of resource due to the physical limits on monolithic architectures, however, scaling horizontally, as found with cloud hybrid systems allows broadcasters to continually add and remove resource to meet the needs of the viewer demands at that time.

Figure 1 – Vertical scaling will soon run out of resource due to the physical limits on monolithic architectures, however, scaling horizontally, as found with cloud hybrid systems allows broadcasters to continually add and remove resource to meet the needs of the viewer demands at that time.

Microservices are one of the most powerful systems available to broadcast for the very reason that they greatly improve resilience and scalability within an infrastructure. Automating the process of adding extra functionality, especially in the cloud, gives the impression of limitless resource availability. This may be the wish, but the reality can often be different due to the capacity and reliability of the IP connection between the broadcasters on-prem datacenter and their cloud provider, and the time it takes to spin up extra resource. Also, extra cloud resource may not be available as quickly as some of the service providers imply leading to the broadcaster needing to keep servers on standby. Although this is easily achievable, it does add to the cost of the infrastructure as the pay-as-you-go model isn’t as linear as it may first appear.

Another consideration for employing hybrid and scalable cloud and on-prem infrastructures is that of security, especially when moving high value media into the cloud. As with all IT designs, security should be considered at the beginning of the design, not as an afterthought at the end of it.

Zero trust is a security philosophy that basically assumes all users and accesses to the infrastructure are potentially hostile. This leads to each and every transaction being verified against a secure database and other methods such as two-factor-authentication. Every time a user attempts to access a function or media file, then the Zero trust system will require them to be authenticated. Clearly a compromise needs to be found otherwise the system will grind to a halt under its own verification overhead, but adopting Zero trust will certainly help keep systems secure when creating hybrid solutions and resource extensions to the on-prem infrastructure.

Encrypting media transfers is a must as hostile actors can sniff and record media streams. But encrypting the media is only half the story as the private encryption keys must be stored safely. Again, using Zero trust helps keep private keys safe and if employed correctly will provide a forensic audit trail for later analysis should something go wrong.

Cloud infrastructures generally form hybrid systems at some point in a broadcast system. Unless the broadcaster is providing a 100% streaming service then there will need to be some form of cloud and on-prem interaction and it is at this point that many of the challenges arise. For example, if all the video and audio processing is provided by a public cloud service then the camera and microphone feeds must be streamed to the cloud servers and network, then they will need to be timed so that switching and mixing between camera feeds is smooth and continuous, and the audio to vision timing is maintained. Although cloud and on-prem hybrid systems may seem like the answer to all our dreams in terms of scalability, digging below the surface soon uncovers many of the challenges we thought we’d already solved in broadcasting. 

Part of a series supported by

You might also like...

Future Technologies: Autoscaling Infrastructures

We continue our series considering technologies of the near future and how they might transform how we think about broadcast, with a discussion of the concepts, possibilities and constraints of autoscaling IP based infrastructures.

Standards: Part 12 - ST2110 Part 10 - System Level Timing & Control

How ST 2110 Part 10 describes transport, timing, error-protection and service descriptions relating to the individual essence streams delivered over the IP network using SDP, RTP, FEC & PTP.

FEMA Experimenting At IPAWS TSSF

When government agencies get involved, prepare for new acronyms.

Managing Paradigm Change

When disruptive technologies transform how we do things it can be a shock to the system that feels like sudden and sometimes daunting change – managing that change is a little easier when viewed through the lens of modular, incremental, and c…

Future Technologies: The Future Is Distributed

We continue our series considering technologies of the near future and how they might transform how we think about broadcast, with how distributed processing, achieved via the combination of mesh network topologies and microservices may bring significant improvements in scalability,…