Designing IP Broadcast Systems: Ground To Cloud

Reducing the latency for video and audio streaming to the cloud has implications for congestion control and care must be taken to avoid congestion collapse.

Streaming media to the cloud should be relatively straight forward, however, analyzing the detail tells a different story.

Computer networks lose IP packets. This isn’t something network architects strive for but is accepted as a method of operation as RFC 791 – Internet Protocol – does not make any provision for guaranteeing packet delivery. The three main areas within the network that cause a packet to be dropped are link collision, switch buffer overflow, or interference in the link medium. Broadcasters tend to use direct connections between a switch/router port and the device such as a camera or sound console, so they don’t need to worry about collisions. Historically, this couldn’t always be taken for granted as early ethernet standards had multiple devices sharing the same passive cable, such as IEEE 802.3 which employed CSMA/CD to detect and recover from ethernet frame collision caused by multiple devices transmitting at the same time.

Fundamentally, the RFC 791 specification achieves two objectives, it segments data and provides addressing. Any other features, such as guaranteeing packet delivery and sequencing is left to protocols that use IP such as TCP (Transmission Control Protocol) and other ARQ (Automated Repeat Re-Quest) type methods. RFC 9293 – TCP – is the most recent TCP protocol specification (published in August 2022) and builds on the original RFC 793 published in 1981, and subsequent additions including RFC 879, 2873, 6093, 6429, 6528, and 6691.

Although the details of TCP and ARQ have increased in complexity over the years, they achieve one fundamental goal, and that is to guarantee delivery of IP packets. In other words, they build on RFC 791 (the IP protocol) to make sure any lost packets are detected and then resent. This may sound like a trivial operation as resending lost packets merely re-transmits a packet held in the transmission buffer should it not arrive, but the major question is how does the sender know whether a packet has arrived or not?

ARQ’s (including TCP) can be thought of as a recorded delivery type system. If I have a document that needs to be sent from Paris to New York, and it will not all go in one envelope, then I first need to separate it into groups of pages. For example, if the document consisted of a thousand pages, then I could separate them into one-hundred piles all of ten pages each. In this analogy, this is a method of segmentation as the envelopes can be thought of as IP packets with a send address on the front and sent-from address on the back. Putting them all in the mailbox at the same time provides two challenges: how do I know my colleague in New York has received them in tact and what sequence should the pages be reconstructed in to retrieve the original document?

Adding the ARQ protocol solves these problems as it guarantees delivery by providing the sender with an acknowledge message, and numbers the IP packets so that they can be reconstructed in the correct sequence. Essentially, the ARQ protocol sends the first IP packet (envelope) and waits for an ACK (acknowledge) message from the receiver to confirm receipt. It then continues and sends the next packet in the sequence and waits for the sender to send back the ACK message. This continues with the sender waiting each time for an ACK message so it can send the next IP packet.

Figure 1 –  Dropped packets are not the only source of variable latency in ARQ protocols, here, the receiver has delayed the processing of the last packet leading to the Ack message being sent late, which results in increased latency.

Figure 1 – Dropped packets are not the only source of variable latency in ARQ protocols, here, the receiver has delayed the processing of the last packet leading to the Ack message being sent late, which results in increased latency.

Digging into the detail a bit more highlights another interesting challenge. How does the receiver “know” the sender has sent a packet? If my colleague is sat in New York and has no form of communication with me in Paris other than through the mail system, then they will quite happily sit and wait until they receive an envelope. But if the envelope I have sent has been lost in transit, then I will continue to sit in Paris waiting for my colleague in New York to send an ACK message before I can send the next envelope (IP packet), resulting in me waiting indefinitely. In essence, the protocol has exhibited a “lock-out” situation as I can’t do anything until I receive the ACK message, but my colleague won’t be sending one as they haven’t received the envelope I sent as it was lost in transit.

A relatively straightforward way of dealing with this is to set a timer when I send the first envelope, and if the timer expires without me receiving an ACK message, then I assume my colleague in New York hasn’t received the envelope, so I resend the same envelope (this assumes I copied the contents prior to sending). If I receive the ACK message, then I can send the next envelope in the sequence. Hence the name Automate Repeat Re-Quest.

This looks relatively straight forward but it soon becomes apparent that not only have we created latency, but the latency has become variable and indeterminate, and this is due to the timer system.

Computer operating systems often host the IP and TCP software to allow programmers to focus on writing their applications. In Linux, the timer system just described is called the RTO (Resend Time Out). However, the really interesting part about the RTO is that it varies and starts low (about 200ms) and increases to multiple seconds for subsequent packet re-sends. This results in a ramping effect of the data rate that is specifically designed to stop the protocol flooding the network with IP packets and causing congestion. It’s referred to as congestion control. 

TCP has well defined congestion control algorithms within the specification to negate the possibility of congestion collapse in the network – this is a real phenomenon and was first experienced in 1986 between Lawrence Berkley Lab and UC Berkley before congestion control was understood and implemented in the TCP protocol. The link speed between the buildings was 32kbps (it was 1986 and state of the art technology for the time), but the data throughput dropped to 40bps, a reduction of 1000 times. The researchers discovered that multiple computers were in a lock-out state as they were rapidly re-sending lost packets, which caused collisions and hence the packets didn’t get through the network. To fix this, the RTO was introduced along with congestion control.

Congestion control is a specification within TCP (RFC 9293) and is mandated for its use. However, the same is not true of ARQs in general. TCP can be thought of as a well-defined ARQ with congestion control, but ARQs that are not TCP tend to be vendor specific, hence the congestion control may or may not be used or even specified.

ARQs have the massive advantage of being able to set their own RTO values as they are not limited by any specification. In video streaming, a 200ms RTO is a massive value representing four or five video frames, consequently, this does not lend itself well to broadcast television methods. Reducing the RTO value has two effects, it rapidly increases data throughput and massively decreases latency, both of which are hugely beneficial for broadcasters. However, the major downside of the implementation is that if adequate congestion control is not employed through slow-start type resends (as found with TCP), there is a significant possibility of congestion collapse.

SRT and RIST both employ versions of ARQ and use UDP packets (fire-and-forget) to send the video and audio data and relay the ACK messages so that they can be tuned for broadcast applications. They both make provision for congestion control to reduce the possibility of congestion collapse.

RFC 8085 – UDP Usage Guidelines, makes specific reference to congestion control and the minimum requirements of its inclusion in ARQ protocols so that congestion collapse is avoided. A broadcaster generating video and audio streams and transmitting them through the internet, as with ground-to-cloud operations, are responsible for their compliance with RFC 8085 (and other RFCs). Furthermore, they should make sure the congestion controls within their implementations of SRT, RIST, and other ARQ type protocols are enabled and working effectively.

Streaming media to the cloud may seem like a straightforward process, however, when connecting to the ISP or internet provider, then we must consider all the other users in the network which may well influence our streaming methods. There’s much more to streaming media from the ground to the cloud than continuously transmitting unconstrained UDP packets. 

Part of a series supported by

You might also like...

The Resolution Revolution

We can now capture video in much higher resolutions than we can transmit, distribute and display. But should we?

Microphones: Part 3 - Human Auditory System

To get the best out of a microphone it is important to understand how it differs from the human ear.

HDR Picture Fundamentals: Camera Technology

Understanding the terminology and technical theory of camera sensors & lenses is a key element of specifying systems to meet the consumer desire for High Dynamic Range.

IP Security For Broadcasters: Part 2 - The Problem To Be Solved

By assuming that IP must be made secure, we run the risk of missing a more fundamental question that is often overlooked: why is IP so insecure?

Standards: Part 22 - Inside AIFF Files

Compared with other popular standards in use, AIFF is ancient. The core functionality was stabilized over 30 years ago and remains unchanged.