This Protocol Comparison Matrix illustrates the tradeoffs that define compressed IP video transport protocols.
Many protocol specifications are used today to reliably transport compressed video over IP networks and the internet. New transport specifications such as RIST with ARQ continue development and testing.
This is Part 1 of a 2-Part series based on several White Papers by Ciro A. Noronha, Ph.D. with Cobalt Digital, and Juliana W Noronha with UC Davis. Dr. Noronha presented his latest White Paper on RIST details at the 2019 NAB Show Monday BEIT session "A Performance Measurement Study of Reliable ISTP" in the Professional Media Networking Track. That session is the topic of Part 2. This first part is a thorough explanation of the latest compressed Video over IP technology specifications, a global review of what is in the field today, and necessary to understand Part 2.
Video transport protocols are built on top of two basic network delivery protocols: User Datagram Protocol (UDP) and Transmission Control Protocol (TCP). UDP is a 'raw' network service where packets are delivered as fast as possible but may be dropped. UDP supports one-to-many delivery using multicast. On the other hand, TCP provides flow control and guarantees delivery, but the cost of that is unbounded delivery. Moreover, TCP is unicast-only. A TCP sender must replicate the stream for every recipient.
Raw UDP transmits the video in the payload of UDP packets. It has zero protocol latency and no packet loss recovery. Multicast is supported and packet replication happens in the network. It is the lowest common denominator and can be found in professional decoders, IP set-top boxes, and software decoders. There are several UDP-based protocols.
Real-time Transport Protocol (RTP) is a UDP-based protocol. RTP is a thin layer on top of UDP that adds timestamps and sequence numbers. It is supported mostly by professional IRDs and some software decoders.
'RTP Plus SMPTE 2022 FEC' is another UDP-based protocol variant. Video is sent using standard RTP with additional Forward Error Correction (FEC) packets also sent using RTP. If there is packet loss, the receiver may be able to rebuild the lost packets from the received packets and the FEC packets. The FEC protocol allows a certain amount of tuning of overhead, latency and recovery capabilities.
'RTP Plus SMPTE 2022 FEC' supports multicast, supports re-ordering and adds significant bandwidth overhead (typically 25% depending on settings). Decoder support is mostly limited to professional IRDs. Use on the internet is risky because it depends on adequate ISP capacity, low congestion and other relatively uncontrollable factors.
Real Time Streaming Protocol (RTSP) began as a control protocol used in video servers because of its 'VCR-like' control user interface for start, pause, and stop. The RTSP control interface is implemented over TCP. Actual streaming uses RTP, sending video and audio elementary streams on separate ports. There is typically no packet loss recovery at the RTP level.
Secure Reliable Transport (SRT) is a proprietary protocol developed by Haivision and later placed in the public domain. The protocol is based on UDT (UDP-based Data Transfer Protocol), designed for high-speed file transfer over UDP. Packets received correctly are acknowledged similarly to TCP. There is an explicit negative-acknowledgement (NACK) for dropped packets to request the sender retransmit packets or un-acknowledged packets. SRT is available in numerous professional encoders and IRDs, and in the VLC software player.
The RIST latency-reliability tradeoff is fully configurable by the choice of the buffer and number of times a packet can be retried.
Reliable Internet Stream Transport (RIST) is a specification published by the Video Services Forum intended for low-latency video contribution/distribution. The first public RIST demonstration by the participating companies occurred in September 2018 during the IBC trade show.
Lost packets are recovered using a variant of Selective Retransmission, called ARQ (Automatic Repeat reQuest). Media transmission is done using standard RTP/UDP. Packets received correctly are not acknowledged, thus no flow control. The receiver requests retransmission of lost packets using standard RTCP messages. It is also designed to be firewall-friendly.
Latency of the protocol can be fine-tuned for network conditions. Using RTP as the base protocol ensures compatibility with non-RIST devices. RIST is supported in numerous encoders, decoders, and gateways from multiple vendors, including multi-vendor interoperability and supported in the VLC public-domain software decoder.
The latest information about RIST developments and testing is the focus of Part 2 of this report.
TCP is connection-oriented. A client explicitly connects to a server and data transmission can go either direction or both ways. TCP does not support multicast.
TCP uses acknowledgments and retransmission to ensure all bytes are received, no matter how long it takes. TCP also provides flow-control. The receiving side acknowledges the data when it is ready to receive more. Flow control is also used to avoid network congestion.
The simplest use of TCP is to create a connection between the encoder and the device consuming the stream. The encoder pushes the data through the connection assuming the end-to-end bandwidth is sufficient and stable. Protocols using a raw TCP connection include RTSP Tunnel Mode and Real Time Messaging Protocol (RTMP).
It should be noted that the ability to flow-control an encoder is typically limited to a restricted range. Buffering at the encoder can absorb some of the TCP flow control, but a TCP connection will slow down to a trickle in the face of even moderate packet loss, and the encoder will either have to drop down to a very low-quality mode or even stop altogether.
RTMP is a proprietary protocol designed by Macromedia for its Flash player. It was later acquired by Adobe, who opened the protocol specification to the public domain. It is used primarily by Flash players to retrieve content from servers. RTMP has an option for the client to publish and encode a stream to the server. The protocol is becoming obsolete as it is media-specific. It only supports H.264 and AAC audio and it is limited to the Flash container format.
RTMP is the de facto standard for publishing live streams in the internet, but it is being replaced by HLS. Latency depends on what processing is done in the server and is typically on the order of several seconds or more. It is resilient to packet loss because it uses TCP. Scalability is at the server.
RTSP Tunnel Mode
Basic RTSP is only suitable for local managed networks. There is no packet loss recovery on RTP. UDP ports are dynamically negotiated and it is not firewall friendly. RTSP has a mode where the RTP data is tunneled over the TCP control connection with the same resiliency as TCP.RTSP encoders are often used in surveillance cameras because of the VCR-like user interface. It is supported in some software decoders and some professional IRDs.
Apple and Android are HLS native devices.
HTTP Live Streaming (HLS)
HLS is a protocol designed by Apple to provide streaming using a standard web server. The video stream is divided into “chunks” of a few seconds each. The decoder downloads the chunks as files from the web server with standard HTTP transactions, using a playlist.
HLS Protocol supports adaptive streaming (multiple bit rates). The encoder can publish to a local (built-in) web server or to a remote server using HTTP PUT/POST similar to MPEG-DASH.
HLS has a very high latency, 3-4x the chunk size, typically from 2 to 30 seconds. It is also highly robust, and scalability can be done using external web servers. There is no TCP flow-control issue on the encoder side when publishing locally. HLS has native support on all Apple and Android devices and is supported in several IP set-top boxes and some professional IRDs.
Part 2 of this report will discuss the needs of broadcasters for cost-effective, low-latency contribution links and the Video Services Forum (VSF) TR-06-1 Reliable Internet Stream Transport protocol specification.
You might also like...
It is almost a hundred years since the color space of the human visual system was first explored. John Watkinson looks at how it was done.
In a multi-disciplinary subject such as color space, it is hard to know where to start. John Watkinson argues that the starting point is less important than the destination.
As High Dynamic Range (HDR) and Wide Color Gamut (i.e.BT.2020) are increasingly mandated by major industry players like Netflix and Amazon, DOPs in the broadcast realm are under intense pressure to get it right during original image capture.…
Most people are aware that any color can be mixed from red, green and blue light, and we make color pictures out of red, green and blue images. The relationship between modern color imaging and the human visual system was…
In this thought-provoking missive, Gary Olson delivers his predictions and insights for IBC 2019.