Cloud Broadcasting - Resilience

In the last article on Cloud Broadcasting we looked at the concept of “Cloud Washed” and “Cloud Born” and the considerations vendors must look at when delivering true cloud systems. In this article, we look more at resilience and cloud system up time.

To get the best uptime from a cloud based system, software should be based on the HTTP (Hyper Text Transfer Protocol) client server model through a web browser. One of the reasons web-browsers have become so popular is that the application software lives on the server, which is under the control of the service provider facilitating easier and more reliable software upgrades.

Service providers have more control over the back-end part of the software, such as database servers and the ability to spin up new instances and allocate resource to meet peak demand. Advances in language designs such as HTML5 and CSS give better graphics display and control handling.

Load Balancers

Cloud providers such as Amazon Web Services (AWS) take this model one step further and encourage the use of Load Balancers. These are a single point of entry for HTTP/IP traffic and work by splitting the messages between web servers. The load balancer keeps a record of TCP client-server connections so it knows where to send future datagrams.

Load balancers provide another valuable function; they allow servers to be physically separated across locations, thus improving resilience. AWS achieves this through their High Availability (HA) infrastructure. Essentially, two instances are created behind a load balancer and each server is in a different availability zone (AZ), defined by AWS as a datacentre in a different flood plane to other datacentres.

Different availability zones in regions.

AWS spreads its services throughout the globe split by geographic area giving resilience and localization for improved network access. Each region is completely independent and consists of multiple AZ’s, and each zone can be thought of as a datacentre. Although they are physically separated, each zone within a region has high speed low latency networks between them.

Smooth Software Upgrades

Locations of datacentres are a closely guarded secret and are not generally known. A region may consist of more than two AZ’s; Virginia in the USA has four and Frankfurt in Germany has two, and AZ’s are identified by names such as us-west-1a and us-west-1b for North California. Load balancers split traffic equally between zones within a region and multiple servers can be enabled in each zone.

Another advantage of load balancers is they provide a smooth process for software upgrades without any downtime. Servers are no longer upgraded in the traditional way, once a software release is available a new server is spun up with the appropriate operating system, the new software is installed on it and the whole system is copied. Amazon refers to this copy as the AMI (Amazon Machine Images), creating a new server with this AMI will exactly clone the original.

Cloud Scaling

If we have a service running one instance in eu-central-1a and another in eu-central-1b, a third server could be spun up in eu-central-1a. Through the software dashboard the first server in eu-central-1a will have all incoming traffic disabled, and when it’s finished processing its current jobs it can be switched off. The same procedure is repeated for eu-central-1b, and when complete both servers will be deleted, thus upgrading without any downtime, a procedure called “rip and replace” in AWS terms.

AMI’s form the basis of scaling within AWS, when a new server is needed, the application software simply spins up a new instance with the current AMI, and then switches it online making it available for use. Once user demand subsides, the application software simply deletes one of the server instances, leaving the vendor to only pay for the uptime use of the server.

Cloud Washed software cannot take advantage of this automation and would instead rely on a developer or engineer to detect the peak demand, and then manually spin up new servers and enable them, remembering to disable them once the peak demand has gone, failing to do so will result in high cloud costs.

Load balancer providing resilience over different availability zones.

Cloud Born software is fully automated and will detect peak demand, spin up new servers and switch them off again, all without any human intervention. Usually, advanced monitoring and alarm systems are integrated into the software to make systems engineers aware of any changes. The costs of allocating additional resources is directly proportional to the demand placed on the system by its clients. Assuming the correct costing model has been adopted the costs will be directly proportional to sales, with minimal overhead and setup costs.

In-built Monitoring

Users can easily transfer AMI’s between zones in a region allowing server instances to be launched quickly. However, if you need to move an AMI to another region, for example from Ohio to Singapore, then the transfer could take a few hours. By doing this, AWS are effectively discouraging users from moving AMI’s between regions.

Load balancers are relatively intelligent and can detect if an attached instance is healthy or not. If the server starts to drop packets, maybe due to overloading or a software bug, the load balancer will detect this and stop sending it datagrams, it will continue to test the server and start sending messages once it recovers.

Load balancers and high availability zones provide a simple, cheap method of improving resilience in cloud infrastructures. Cloud Born systems take advantage of this to meet peak user demands and improve performance without human intervention, thus reducing costs and improving response times. Cloud Washed solutions can still take advantage of these systems but will be slow and expensive due to the manual intervention of expensive humans.

Other related articles posted on The Broadcast Bridge.

Cloud Broadcasting - Cloud Washed or Cloud Born?

You might also like...

Broadcast Standards: Cloud Compute Workflow Pipelines

This is a detailed exploration of system & workflow principles, storage systems, queue management, how microservices enable active workflow designs, and using node graph systems to create a friendly UI.

Building Software Defined Infrastructure: Systems & Data Flows

For broadcasters seeking to build robust workflows from software defined infrastructure, key considerations arise around data flows and the pro’s and cons of open and closed systems.

Broadcast Standards: Microservices Functionality, Routing, API’s & Analytics

Here we delve into the inner workings of microservices and how to deploy & manage them. We look at their pros and cons, the role of DevOps, Event Bus architecture, the role of API’s and the elevated need for l…

Live Sports Production: Part 3 – Evolving OB Infrastructure

Welcome to Part 3 of ‘Live Sports Production’ - This multi-part series uses a round table style format to explore the technology of live sports production with some of the industry’s leading broadcast engineers. It is a fascinating insight into w…

Monitoring & Compliance In Broadcast: Part 3 - Production Systems

‘Monitoring & Compliance In Broadcast’ explores how exemplary content production and delivery standards are maintained and legal obligations are met. The series includes four Themed Content Collections, each of which tackles a different area of the media supply chain. Part 3 con…