Optimizing Encoding & Contribution For Live OTT

Optimizing contribution for OTT feeds is more complex than traditional broadcasting due to the internet compliance required. Although TCP/IP provides a solid base to transmit contribution circuits, it can introduce latency.

This article is part of 'The Big Guide To OTT - The Book'

OTT has dramatically expanded the range of delivery outlets for content and continues to do so. This has had a direct effect on content production, enabling almost any organization or person to create and distribute live content, which has increased the volume of content being created. Whether this content is for entertainment, news, education or communications, video is our preferred communication medium and people generally turn to this whenever possible.

While “high-quality” live broadcasting is normally reserved for the highest budget productions, technology developments mean that similarly low latency, high-resolution and professionally produced content can be produced on much lower budgets than ever before. A critical enabler is in the video encoding and streaming solutions being used at the point of content capture, and how these solutions enable the use of lower-cost connectivity solutions.

The Live OTT Content Producer

OTT consumption has had an impact on all parts of the Media & Entertainment industry, from household name broadcasters to small businesses and individuals. From premium subscriber content on Netflix or DAZN, to home-made videos, OTT is an audience-reach enabler.

In many ways, the OTT consumption shift has primarily been beneficial for “niche” content that has a smaller monetization opportunity but is still highly valued by its supporters. Sports clubs, sports leagues, e-sports businesses, gaming companies, theatre and arts venues, academic institutions, membership groups, local governments, charities, businesses and more can take advantage of their growing ability to produce high-quality video and monetize their own content.

To convert fans and followers into paying members or subscribers, there are two key questions from a content perspective. First, how do you ensure content production is high quality and delivers the best viewing experience, especially for live content? Second, how do you ensure the content delivery is high quality, again especially for live content?

Much of The World of OTT series focuses on the challenges of Live OTT content delivery to match the low latency, broadcast quality viewing we have come to expect as consumers. Larger organizations typically have the resources to produce live content using dedicated, high-performance, high-bandwidth infrastructure, whether fiber or satellite, and top-quality production studio facilities. Smaller organizations instead rely on internet-based contribution networks. Additionally, COVID-19 has forced more remote production which has caused internet-grade, cloud-based, multi-tenant technologies to be heavily tested for even the highest level of live TV content we see on our big screens.

Figure 1 – Live OTT Workflow, with emphasis on Remote Production and Video Contribution.

The technology has proven itself to be prime-time-ready. One live production event in 2020 that proved this was the USA’s NFL Draft which 55 million people watched. This huge event in the US sports calendar relied on internet connectivity to players at home while universities and clubs engaged from numerous locations. On a much smaller scale, numerous live events are now produced with remotely connected production teams. Low latency delivery over IP networks, including the internet, has been absolutely essential to support this, and it continues to drive higher quality content production, largely in support of OTT content delivery.

Live OTT Quality

“We must start as we mean to go on” is a phrase that is commonly used in everyday life. It is also relevant for Live OTT video contribution.

Once a camera is streaming images over a contribution network to a physical or virtual studio the contribution encoding technology is having an effect. This materializes in three primary ways. First, the quality of production, such as camera angles, commentary or graphics. Second, the video latency for the end viewer as the video moves into distribution networks and aims to at least equal social media platforms for the speed of information delivery (if latency is added in the contribution network it cannot be removed later in the distribution network). And third, the quality of image for the end viewer which is affected by the quality of the IP stream.

To deliver the best possible viewing experience, the goal of the “first-mile” content delivery chain is to start out with the lowest possible latency and the highest possible picture quality. Ideally, the latency should not exceed 150ms (about 9 frames at 60 frames per second).

With this input the downstream viewing experience can be as good as possible. Without it the viewing experience can be negatively impacted which is not what paying or high-expectation OTT consumers want.

Contribution Protocols

Several “internet-ready” protocols have been invented that support the Live OTT contribution requirement to provide functionality the existing transport protocols like RTMP and OTT delivery protocols like HLS were not able to provide.

SRT (Secure Reliable Transport), invented by Haivision and made open-source for the ongoing development of remote production solutions for the video industry, is designed to encrypt and stream real-time content over the internet or networks with variable performance. It can transport a high-quality video stream from a camera and video encoder over an IP network at very low latency. SRT is interoperable with other broadcast production elements.

RIST (Reliable Internet Stream Transport), developed by the Video Services Forum, is also designed for low latency contribution delivery over the public internet. RIST takes an RTP (Real-time Transport Protocol) stream input, makes small changes to the labelling, transmits and then decodes on a FIFO basis at the other end of the network to place packets in their correct order. A core design principle of RIST is interoperability between manufacturers.

Other efforts in this space include LCEVC (Low Complexity Enhancement Video Codec) managed by MPEG and ISO, which provides a software-defined codec enhancement that can be applied to a video stream as it passes to an end consumer device. This improves the quality of the image being delivered but is not designed to achieve low latency over unmanaged contribution networks so is not the focus of this article.

Stream Latency

From a latency perspective, the low-latency protocols deliver live content at latency levels comparable to UDP but with the reliability of TCP/IP. They offer speeds comparable to UDP while avoiding its packet loss disadvantages by extending traditional network error recovery practices through latency buffer configuration tools to improve packet loss recovery for poorer network conditions.

To achieve best possible latency over variable networks, these protocols can be configured using parameters such as latency buffer which work within the restrictions of the total round-trip time and rate of packet loss. These parameters can be managed differently according to the content type, such as dynamic live sports, a concert or a one-person interview.

They then use low-latency stream buffering for packet loss recovery with automatic repeat requests (ARQ) to overcome packet loss and jitter depending on network conditions.

Stream Quality

SRT and RIST are content agnostic from a camera to a centralized mixer and distribution encoder. This provides maximum flexibility to the production teams for delivering content from a venue into a studio environment. This compares with RTMP which cannot transport content encoded in HEVC or AV1 formats, nor handle multiple language tracks. This feature is important for the delivery of a live stream from a studio environment to the Origin for onward ABR-encoding.

At the same time, SRT and RIST allow a high-quality codec to be used for production which helps to maintain quality throughout the video delivery chain as it is re-encoded or re-transcoded for delivery to consumers. For example, if SRT or RIST is used for the original contribution stream with HEVC for example, even if it is to an H.264 ABR cascade, it will retain a higher picture quality than if a lower quality H.264 stream was used for contribution.

The Performance-Cost Trade-off

Latency, Quality and Cost are the three points of trade-off for live contribution encoding.

At one end of the cost spectrum (see Figure 2) is video contribution with low latency and high quality. This video will stream at the highest bitrates and require the most network bandwidth. It is the highest cost option and is therefore normally used for premium televised content. It quite often utilizes private network connectivity and hardware-based encoding for maximum performance and reliability. Hardware encoders are also required to process codecs, like HEVC and JPEG-2000, which enable higher quality downstream delivery. Low latency protocols like SRT and RIST are not normally used in this scenario.

Low quality and low latency require lower-quality codecs to be used that will ultimately deliver lower-quality images to the consumer. These contribution streams will be able to use lower bandwidth network connections, although if packet-loss is not managed then network reliability could degrade the streams even further.

Figure 2 – Multiple Scenario trade-offs for Performance and Cost.

SRT and RIST enables the third scenario. This scenario uses encoders that deliver low latency and high quality which sustains a similar encoding price point for the “high quality live” scenario but significantly reduces the networking costs because the protocols are designed to traverse the internet. This technology shift is what is enabling very good quality live production from many more venues than ever before, bringing content to viewers in more ways than ever before. Given that encoders are generally purchased once, while network bandwidth is consumed on a recurring basis, the extra investment in encoders makes sense for this scenario.

Low Latency And Remote Production

As with live video contribution, remote production also benefits from low latency. Production quality is improved as production teams, including commentators and directors, react quickly to what is happening in real-time and collaborate from remote locations. This is almost taken for granted with on-site production, but even small latency issues add up to larger problems in a remote production environment, from mis-aligned commentary, to delays with contribution feeds themselves, to the need to add latency into the live stream to accommodate contribution delays. The weakest link in the latency contribution chain will affect the rest of the production.

Broadcast-grade productions need bi-directional feeds with latency of less than 0.5 seconds. Using the same encoding functionality as the contribution feeds, remote production teams can work together with low latency over variable and lower-quality network connections. For production teams based at home or other remote locations, this makes a serious difference.

Security

Original content production for D2C OTT services, especially those that are paid-for services, needs to be protected and so do the raw contribution feeds used for producing live content. The SRT protocol supports 128/256-bit AES encryption from the initial link up to the receiver to a trusted partner.

At the same time, security management can be cumbersome and easily interrupt a live video feed if not managed correctly. To prevent unnecessary security impacts, SRT was designed with “caller”, “listener” and “rendezvous” modes for secure, automated sharing behind firewalls. Other video protocols, such as RTMP, are not designed to support secure two-way links. Therefore, manual network port management is required to use them in live video over complex networks (especially secure corporate networks) which is an unnecessary obstacle.

With SRT Caller and Listener mode you can set the receiving device (typically a decoder) behind the firewall to caller mode so that it then tells the encoder (in listener mode) to send the stream to a specific IP address. SRT has the advantage of supporting bi-directional streaming meaning an SRT Caller behind a firewall can be used for both sending and receiving streams. Using this approach prevents unwanted streams from passing through a firewall.

The Way Forwards

For live productions over the internet thatneed high-quality production and delivery the protocols like SRT, RIST and LCEVC provide the opportunity to save cost by simplifying and reducing the quality standard for the contribution network. SRT and RIST are specifically targeted at overcoming latency challenges. These protocols, however, do not alter the need for high performance encoders that will assure feed reliability. The double-benefit of using SRT and RIST is that the same protocols that provide low latency over variable performance networks are also enabling remote production of a higher quality.

Overall, smaller D2C Content Providers now have the opportunity to monetize their content more effectively by increasing the quality. And larger Content Providers can expand the range of highly professional content they produce alongside their main linear feeds, at lower cost. It’s good news for consumers who want to see the content, and it’s good news for the revenue and profit accounts of the D2C Content Providers.

Part of a series supported by

You might also like...

Microphones: Part 11 - The State Of The Art… And The Potential Of MEMS Microphone Arrays

Here we look from the state of the art in microphones, to what the future may bring with the enticing theoretical potential of microphone arrays built using MEMS technology.

Microphones: Part 10 - Mid-Side (M-S) Recording And Processing

M-S techniques provide useful sound-field positioning and a convenient way to check mono compatibility. We explain the hard science behind this often misunderstood technique.

Microphones: Part 9 - The Science Of Stereo Capture & Reproduction

Here we look at the science of using a matched pair of microphones positioned as a coincident pair to capture stereo sound images.

Microphones: Part 8 - Audio Vectorscopes

The audio vectorscope is an excellent tool for assuring quality in stereo sound production, because it makes the virtual sound image visible in the same way that a television vectorscope allows the color signals to be seen.

Microphones: Part 7 - Microphones For Stereophony

Once the basic requirements for reproducing sound were in place, the most significant next step was to reproduce to some extent the spatial attributes of sound. Stereophony, using two channels, was the first successful system.