Network Traffic Engineering: RIST & SRT - The Success Of ARQ Based Protocols

IP networks are inherently unreliable. We kick off this series on IP Network Traffic Engineering with a look at how RIST and SRT give broadcast engineers user-configurable control over the latency-versus-reliability trade-off for real-time media streaming.

This article is part of ‘Network Traffic Engineering: Part 1’.
Download the entire content collection for free here.

RIST and SRT are both UDP-based transport protocols that have been designed specifically for the television broadcast industry to deliver live media over unreliable IP networks while keeping latency bounded and failure predictable.

Key here is that we’re using terms that make delivery determinate, not necessarily the lowest the latency can be or expecting to achieve zero packet dropout. Both protocols allow the user to balance latency with reliable packet delivery.

For unreliable IP networks, packet loss is inevitable, and we can deal with this in two ways; either to ignore the packet loss and compromise data integrity, or have the sender retransmit the packet so that data integrity is maintained. Improving data integrity results in increased latency, and on the other hand, if lower latency is required then we must compromise data integrity.

This balance assumes that the video and audio we are distributing is over an unreliable IP network, but how unreliable can only be found during testing or operational use. IP networks can be improved massively in terms of their reliability, and we see this with ST 2110 infrastructures. But in doing so, the cost of equipment delivery, infrastructure complexity, and engineering knowledge also increases massively. It’s also probable that when we move outside the domain of the broadcast facility, the IP link will be owned by a Telco, which is effectively a black box whose operational detail is hidden from us.

No Guarantees

The assumption of unreliable IP networks isn’t just an anomaly of the system, it’s explicitly documented in RFC 791 (IPv4 protocol) and states: “The internet protocol does not provide a reliable communication facility. There are no acknowledgments either end-to-end or hop-by-hop. There is no error control for data, only a header checksum. There are no retransmissions. There is no flow control.”

In other words, IP fails to guarantee everything broadcast engineering demands from video, audio, and metadata delivery.

This is not an accident. The architects of IP designed the protocol to be a best effort system, that may well drop packets, deliver out of sequence, and duplicate etc. RFC 791 has no provision for packet retransmission, congestion control or timing guarantees. In other words, IP networks are expected to misbehave. Something that is counterintuitive for broadcast engineers who are accustomed to working in highly predictable circuit switched networks such as SDI and AES.

However, we do know that IP networks maintain data integrity for many different applications, including banking, commerce and web page access. But it does this using TCP (originally defined in RFC 793 with multiple enhancements) as this protocol guarantees packet delivery, ordered packets, flow control, and congestion avoidance. The price we pay for this is not just latency, but indeterminate latency. From the perspective of timing, TCP does not deliver time determinate systems as the latency can vary significantly, especially during packet dropout and compromised round-trip-times (RTT).

For many transactional type data exchanges, as those found in banking and web page access, variable latency isn’t much of an issue. It doesn’t really matter to me if it takes two or three seconds, or even one-hundred milliseconds for me to see my bank balance on an ATM screen. When streaming media, this is a totally different story. In essence, TCP was not designed to operate in a real-time system, but RIST and SRT were.

Determining Latency

RIST and SRT not only improve data integrity for unreliable IP networks, but they also keep latency determinate, and this is a major strength. In mainstream broadcast engineering we often hear of the need to keep latency as low as possible. SDI networks kept latency to orders of a few hundred nano seconds, and when we say low latency, we are invariably comparing IP networks to this order of magnitude, which is usually unachievable.

The difference between low latency and determinate latency cannot be overstated. A determinate system that can maintain a fixed latency all the time, is much better than a low latency system that can guarantee low latency some of the time.

This is where average data rates can become confusing. It’s entirely possible for a TCP system to guarantee an average of say, 10Mbit/s, but the tails of the distribution tell you what’s really going on. Even peak jitter, often associated with a description of the events that are happening in the tails, don’t tell the whole story as they rarely show the frequency of packet timing distribution. For example, a 10Mbit/s stream with occasional 100ms jitter peaks can be more tolerable than the same stream exhibiting frequent 10ms jitter. In practice, rare large jitter events can often be absorbed by buffering, whereas smaller but frequent jitter variations are more likely to cause sustained buffer overruns at the receiver.

TCP can easily show 100% packet throughput with a healthy average, but what it doesn’t show is the latency, which is not only indeterminate, but highly variable.

Automatic Repeat Requests

RIST, SRT and TCP all use a method of packet resending called ARQ (Automatic Repeat request), where lost packets in transit can be re-requested to be re-transmitted. This works well and TCP has stood the test of time, but it does suffer from one very important anomaly (at least where streaming media is concerned), and that is head-of-line-blocking. This is illustrated in Figure 1 where packet-4 is lost in transit. The TCP receiver can determine that packet-4 has been lost after a predetermined timeout period. As packets-5 and -6 have been received, the TCP receiver will request packet-4 to be resent. At this point, two events have occurred; firstly the receiver timeout period has expired, thus delivering variable latency, and secondly packets-5 and-6 are waiting in the TCP receive buffer. This is head-of-line-blocking as packets-5 and -6 cannot be presented to the media decoder until packet-4 has been received. In the meantime, all subsequent packet transmissions (packets-7, -8, -9, etc.,) are waiting until packet-4 has been successfully delivered.

Figure 1 – The top diagram shows head-of-line-blocking on a TCP flow. When packet-4 is lost, packets-5 and -6 are held in the receive buffer. The bottom diagram shows RIST and SRT operating in a real-time configuration. Packets-5 and -6 continue to the decoder even if packet-4 is lost.

For transactional and short burst data exchanges, these events still occur and the increased and variable latency is the price the system pays for data integrity. This may be acceptable when accessing web pages, or online bank statements, but this variable latency is completely unacceptable for media streaming between broadcast facilities. Yes, the latency can be ironed out with larger receive buffers, but all that’s doing is just kicking the can down the road as head-of-line-blocking is still occurring.

RIST and SRT, although using the ARQ method of packet resend, both reduce the possibility of head-of-line-blocking. In Figure 1, where packet-4 is lost, the receiver can decide whether to discard packet-4 or wait for it to be resent. Discarding packet-4 will certainly cause a picture or sound error, but the extent of a lost packet is codec dependent. For example, a lost packet in an I-frame only type stream where every single frame is encoded as an Intra-coded frame, will have a minimal effect on one frame of video, but if the stream uses a long GOP, then a lost packet can have a significantly negative impact on the video and audio viewing experience.

The interesting aspect of RIST and SRT is that the decision on how long to wait, if at all, for the lost packet-4 to be resent or ignored, is user configurable. The engineer or person configuring the system can trade latency for data integrity. If they can accept slightly more packet loss, as with I-frame only streams, then the latency can be reduced, but if the stream is a long GOP then the latency can be increased. Not only is this user configurable, but the latency setting is fixed, and hence the network becomes determinate in both the time and data integrity.

Congestion Collapse

In the early 1980s when the internet was in its infancy, the anomaly of congestion collapse was first witnessed. This occurs when a network gets so overloaded that sending more data results in less useful data arriving. This is because lost packets trigger lots of TCP retransmissions that fill the network with retries of already sent data, resulting in high data averages, but low useable data throughput. Congestion collapse is a well-documented and researched phenomenon and TCP has been adapted over the years to prevent congestion collapse occurring. It achieves this by slowing down the resend rate when packets are lost. Initially backing off its send rate allows the network to recover as extra bandwidth capacity appears and lost packets decrease, even though the remedy significantly increases latency.

RIST and SRT take a different approach: as they operate over UDP and focus on bounding latency by selectively retransmitting only the packets that might still arrive in time, but they do not automatically slow down resends like TCP. Consequently, there is a technical possibility that if the network and protocols are not correctly configured, then they could contribute to congestion collapse. The engineer configuring any ARQ type protocol must be aware of this otherwise, in a congestion collapse condition, they may well see very high data rates, but the pictures will break up and the audio will distort.

RIST and SRT have contributed greatly to providing time-determinate IP networks with high data integrity and taking a deep dive into how TCP operates helps explain why they are so useful.

This article is part of ‘Network Traffic Engineering: Part 1’.
Download the entire content collection for free here.

You might also like...

Standards: Video - Standards For Video Coding

From 4K to 32K, the demand for ever-larger video formats is pushing codec technology to its limits. This guide surveys the landscape of video coding standards – from legacy MPEG formats to AI-driven neural network compression – to help navigate the choices sha…

Broadcast Standards 2026 – Video Coding

Video coding was developed to deliver video conferencing services over low-bandwidth modem connections, but modern demands for ever-larger video formats are pushing codec technology to its limits.

Network Traffic Engineering: Part 1

IP networks are inherently unreliable. They always have been – it is literally designed in as a feature.

Standards: An Introduction To Standards

There are many standards relevant to the broadcasting and media industry. In this section we examine the background to standards, who develops them, where to find them and why they are absolutely and totally necessary.

Broadcast Standards – The Book 2026

We need standards more than ever. The rapid evolution of technology and connectivity is challenging the very idea of what broadcasting is. Broadcasters are having to find new commercial models to maintain audiences, and modern production workflows deliver the flexibility…