In this article, George Kroon, research broadcast engineer, takes a look at how Negative ARQ protocols similar to those used for internet streaming and contribution can be improved specifically for broadcast television.
Recent years have seen the broadcast world moving from traditional transmission mediums like SDI, to Internet Protocol (IP) transmission for the propagation of video. Advantage is to be gained here from the scalability, ease of management & automation, and security implementation already found in computer networks, but this transmission medium poses several challenges, namely the loss, reordering, corruption and delay of packetized video traffic.
The broadcast industry's answer to tackling these issues has been the introduction of Negative-Acknowledgement-based ARQ (NAK-ARQ) protocols, the two most popular of which are SRT and RIST. Seeking to maintain the low-latency functionality of UDP, whilst adding TCP-like error detection and correction, NAK-ARQ operates by only requesting a video packet re-transmission from the sender in the event that a packet has not arrived at the receiver within an allotted time period. Packets are also sequentially numbered, allowing a receiver to buffer and re-sequence them in the event that high levels of network jitter have caused reordering.
This article looks at ways to secure the best received video quality across extremely adverse network environments by considering not only how NAK-ARQ tackles UDP's shortcomings, but how it can be improved when delivering video over impaired networks.
Areas of Concern
Each packet is protected equally. At first, this might not seem like such a bad thing, but when we consider that in the most common interframe codecs used today (e.g. H.264, H.265) not every frame is of equal importance, it can be ceded that some packets are more important to the visual quality of the decoded video than others. Therefore, wasting premium bandwidth on the protection of less significant data is a common occurrence.
When using the term "frame importance" only the key features of the codecs are considered as this allows us to dispense with any data between frames that are not new. Within a predefined number of frames, or a GoP (Group of Pictures), the first and last frames contain the full scene image - known as I-frames (intra-frames) - but the intermediate frames are made up of predictive-frames (P-frames) which only contain any new or changed information, and bi-directional frames (B-frames) which look ahead and behind to fill in any missing information in the sequence.
Such "interframe" codecs contain frames of varying data quantities and of varying importance to the visual quality of the decoded video. Therefore, an ideal protocol would apply corresponding amounts of resource to the protection and retransmission of video packets based on their importance.
On a similar note, generalized encoders employ an algorithm called TLPD (Too-Late-Packet-Drop), using propagation delay feedback to destroy packets at the sender-side which are unlikely to arrive at the receiver in time to be decoded. The decision to destroy a given packet is based on the age of the sequential timestamp in the packet, once again agnostic to the actual video payload content of each packet and purely based on the chronological creation of the protocol headers.
Retransmission success is directly related to the assigned retransmission bandwidth. Usually, a certain bandwidth is allocated for retransmission, for example 25%. This means that if all packets are corrupted or lost, up to one quarter of them can be recovered. However, if the retransmission bandwidth is heavily restricted, then there will be insufficient bandwidth for retransmission resulting in a degraded image.
Generally, there is no inherent FEC (Forward Error Correction) functionality – traditional FEC methods require that this must be added to the NAK-ARQ, meaning extra bitrate on-top of the protocol. Consequently, video signal bandwidth is further limited.
Implementing intelligent retransmission systems to tier the prioritization of I, P, and B frames will destroy the actual lowest priority packets with TLPD if necessary.
Expanding on that, a packet containing a P-frame is more vital to the decoded visual quality than the preceding B-frames in that GoP. An example of where we can take advantage of this is through the DTS (Decode Time Stamp) from the Video Packetized Elementary Stream. The DTS actually does already allow the decoder application to reorder video frames in the order of importance needed to decode a GoP, so it makes perfect sense that this order should be maintained and protected in the transport domain, see figure 2.
Secondly, we could use inter-layer, video quality loss-impact-estimation so that unequal protection may be added to packets which are truly the most important. However, this analysis is not simply limited to whether or not they are I, P, or B frames, but instead considers macroblock attributes for more granular metrics. This includes the number of partitions within a given macroblock (corresponding to detail complexity), the frame with the greatest area relied upon as a decode-reference for the longest period of time in the GoP, and a set of macroblocks' ability to provide interpolation recovery for temporally or spatially adjacent data that has been lost.
Finally, a checksum-based list-decoding approach could be added to reduce retransmission requirements in a noisy environment where packets are being corrupted. The list-decoding approach uses the existing packet checksum to isolate corrupt bits and then tests the possibility of each flipped-bit to find the option that most likely results in a successful video decode. Whilst this is effective for correcting multiple corrupted bits per packet, it does not account for lost or reordered packets, meaning that it compliments a sequence-oriented Negative ARQ protocol very well.
The incorporation of I-P-B frame intelligent retransmission systems alongside the more refined inter-layer, loss-impact-estimation method would, in theory, drastically improve the viewer's immersive experience when receiving interframe codecs over highly impaired networks, through unequal packet protection and destroying the lowest priority packets first in TLPD.
Furthermore, there is strong evidence for benefit through implementing a checksum-based list-decoding approach, reducing retransmission requirements in noisy networks where packets are being corrupted without the bitrate increase FEC usually incurs.
We have only just begun our journey with NAK-ARQ protocols and it’s clear from this analysis that there is much more we can do to improve its efficiency.
You might also like...
Time base correction is an enabling technology that crops up everywhere; not just in broadcasting.
As broadcast facilities and other organizations that use media to educate and inform continue to carefully make the move to video over IP, they currently face two main options, with a range of others in the wings. They may opt f…
Due to the flexibility and virtually unlimited access of the Internet Protocol, manufacturers of broadcast and production equipment have for years provided customers with the remote ability, via an HTML 5 browser interface, to monitor and control hardware devices via a…
“You need to be very predictable with the broadcast at all times. When I started doing this you had to be really careful with 5.1; there was no standardization,” he says. Indeed, for a long time, as broadcasters began to switch to …
Media streaming over the internet is unique. Packet switched networks were never designed to deliver continuous and long streams of media but instead were built to efficiently process transactional and short bursts of data. The long streams of video and…