OTT Content Origination

Content Origination is in the midst of significant transformation, like all parts of the OTT video ecosystem. As OTT grows and new efficiencies are pursued, Origination must play its part as a fundamental element of the delivery chain. But Origination is not just about smooth and efficient content delivery. It’s also about providing key features to the OTT service.

This article is part of 'The Big Guide To OTT - The Book'

Content Origination is the point at which Live, Linear and VOD content are prepared for final delivery and streamed into the delivery networks. It is where the push systems of broadcast playout and VOD asset publication meet the pull system of streaming to multiple device types according to the required bit-rate and format.

Today, this push-pull line can be blurred. Linear OTT often still treats the Content Origination platform as part of the push system, passing through all formats and bit-rates to the CDNs regardless of whether they were requested or not. In many CDN environments this is required to “warm up the caches” so content is ready for when it is requested to achieve lowest possible latency with minimal requests back to the Origination platform.

For VOD, the push system ends as content is moved into Central Storage, ready to be streamed on-demand. That said, some CDN environments will replicate an entire VOD library in their own storage, pushing content deeper into the delivery network.

Content Origination Functions

Content Origination combines a set of functions that prepare content for OTT delivery. The diagram below shows the primary functions.

Figure 1: Primary functions of OTT Content Origination.

The first step is at encoding and transcoding, to ensure the live streams and VOD files are in the correct bit-rates for OTT delivery. Generally, an ABR group is prepared according to a set of pre-defined profiles, designed to handle the variations in network conditions and the range of devices that can request content. The choice of codec is an important consideration for the whole Origination Platform, given that newer codecs offer improved efficiency but at the cost of higher levels of processing (e.g., HEVC is more intensive than H.264). Recently, the concept of encoding on-demand has grown, as technology leaders have leveraged cloud compute to adjust encoding workloads according to consumer demand, resulting in more efficient use of resources.

Once content is encoded/transcoded there can be multiple workflows depending on whether the content is live or VOD, whether timeshifted viewing is available to the viewer, and whether or not low latency is required.

The core functions are therefore used in different ways according to the workflow. In general:

Live Recording and/or File Ingest – to provide Live TV catch-up services, now a basic feature of OTT services, the live stream is recorded to create a back-up for content held in the CDN. This ranges from short-term Live Pause to long-term CloudDVR. For VOD, files need to be ingested into storage. These functions integrate closely with the OTT Central Storage and synchronize with the Content Management Systems in order to confirm their availability for viewer consumption. The “and/or” distinction means that some platforms are unified for both Live and VOD services while some are separate, which is generally decided based on different operational requirements, which include scalability, for Live and VOD services.
Storage Management – not only is content recorded and ingested into Storage, but the Storage must be managed. As metadata is updated, as content ages, and as content is deleted, something must manage where that content is and synchronize with the Content Management System. This content life-cycle management is often a software function in a module within the Content Origination platform.
(JIT) Packaging – this function is required in order to deliver content to different device operating systems, like Android (DASH), Apple (HLS) and Microsoft (MSS). Some OTT systems package every piece of content regardless of demand – below this is referred to as Legacy Linear OTT. Just-in-Time (JIT) Packaging is best practice for large VOD libraries where it is inefficient to package every piece of content before storing it.
(JIT) Encryption – once packaged, content is encrypted for secure delivery over the internet to authorized viewers. Each package type has a respective encryption method (e.g. HLS uses Fairplay), although some package types use multiple encryption methods (e.g. DASH can use Widevine and Playready). Encryption follows packaging, so if content is packaged and then stored, it is generally also encrypted. JIT-Encryption accompanies JIT-Packaging for a more efficient storage model for large VOD libraries.
Low Latency Processing – this can be isolated as a specific function of the Origination platform. The compute resources can be set up to act on specific streams or specific pieces of content to deliver in Low Latency formats which involves reducing GOP sizes and managing many more connection requests as smaller segments are delivered. The decision to do this is made further upstream in the Content Management Systems, but then executed at Origination.
File and/or Live Streaming – once one or more of these content processing steps are complete, the content is streamed, pulled by requests from the CDN(s). Typically, the “origin server” has been an internet-facing web server specifically designed for delivering streams, which passes through streams from the Packager. Today these are software functions rather than discrete servers, although depending on workflow scalability and operational models it can make sense to deploy the software in dedicated hardware environments.

Evolution Of Content Origination

Today there are 2 primary models for Content Origination, with 2 new models on the horizon.

The 1st generation Broadcast OTT model (Figure 2) was introduced when Linear TV channels began to be streamed OTT. It is still widely in use today because it is a simple way to provide OTT content for linear broadcasters. This model takes an output from a broadcast channel which is then encoded, packaged and encrypted for OTT delivery. It is also recorded for catch-up TV. In this model, VOD utilizes a similar workflow to package file-based content before storage, to simplify operations and onward delivery to Streaming.

As shown in the grey Outputs boxes, this model produces a set of outputs which are consistent across the delivery chain. There are two drawbacks of this model: 1) it requires an unnecessary amount of central storage (shown with a value of “X”) and 2) it creates a pre-formatted VOD library that is inflexible to changes in format type that routinely occur as new consumer devices come to market.

Figure 2: 1st generation Broadcast OTT – Linear OTT with Catch-up VOD.

The “JIT” model (Figure 3) is increasingly used today. Pioneered by large VOD businesses like Cable TV operators, JIT addresses the two main weaknesses of the 1st generation model. First, it provides flexibility to the fast-changing consumer device world by storing content libraries in a mezzanine format and then packaging and encrypting on-demand. This means that if a new format is required, the OTT operator does not need to transcode all VOD assets to the new format, potentially requiring weeks of processing. Secondly, it significantly reduces the size of the central storage. For example, storing in HLS, DASH and MSS increases the number of files stored by 3-times more than necessary when compared with a JIT model. As broadcaster OTT libraries grow into the multi-PB range this makes a big difference, not just for storage costs but also for streaming performance.

Figure 3: 2nd generation Broadcast OTT – JIT Model for Live and large-scale VOD.

The 3rd generation model can be called “Common Format & Encryption” (Figure 4). This is enabled by the CMAF and CENC formats, which are the basis of today’s DASH and HLS low latency formats, and are moving us towards a truly common format. From a storage and streaming efficiency perspective, this model no longer requires JIT packaging or encryption of the media segments. Instead, the single format and encryption type means that the stream or file is prepared, then stored and streamed. Not only does this result in a simplification of processing for packaging and encryption, but it also greatly improves cache efficiency through the use of common media segments, it retains the same storage efficiency as the JIT model, and it reduces the complexity of egress for the Streaming component of the platform which previously egressed multiple package types.

This model is deployable today but is likely to be limited to a low percentage of use cases based on end-to-end adoption of CMAF (including players, devices, etc.). Leaders in this space envisage the generation 3 model to become the standard over the next 2-3 years, but to work alongside models 1 and 2 for many years. In the meantime, benefits of VOD library flexibility and storage efficiency can still be achieved through the JIT model.

Figure 4: 3rd generation Broadcast OTT – Common Format & Encryption.

The 4th Generation model is emerging now, more in discussion than in deployment, aiming to leverage artificial intelligence, machine learning and the most modern codecs. We can call this the “Consumption-Aware” model (Figure 5). In this model a new level of consumer personalization is achieved as the system applies the video codec and the bit-rate profile according to the specific piece of content and the type of device requesting it.

This model creates the most efficient Origination platform based on intelligence being applied before video is processed, and it enables even more efficient storage by leveraging the benefits of newer codecs where possible and creating more optimal ABR profiles. It is another step towards the perfect pull-system that seeks to optimize content delivery at scale.

This model is not about individual consumer customization. Decisions must be made about groups of customers and groups of content. This model will use metadata about customers’ devices and each piece of content, plus artificial intelligence to analyze consumer behavior and the level of demand for particular content.

An example of this in action on a live stream could be the decision to stream a graphically rich program in a higher-quality codec like HEVC or AV1. This would optimize the infrastructure being used, perhaps on a pay-per-use basis, and allow premium content to have premium viewing experiences. On a VOD library, AI-driven processing could be used to streamline or enrich codecs and bit-rates available in the library based on consumer demand. Codec licensing could become part of the variable factors used to optimize customer satisfaction, OTT throughput and total cost.

Figure 5: 4th generation Broadcast OTT – Consumption Aware.

Content Origination is the execution point for delivering required content formats to consumers. While it is not a part of the OTT infrastructure that needs to dramatically expand in capacity as audiences grow (unlike storage and edge caching – see other articles in The World of OTT series), it is the first part of the OTT delivery system where the dual objectives of customer personalization and delivery efficiency are simultaneously addressed. Which is why the Content Origination platform directly enables OTT operators to achieve the vision for OTT – the delivery of highly personalized viewing experiences, at scale

Part of a series supported by

You might also like...

Microphones: Part 11 - The State Of The Art… And The Potential Of MEMS Microphone Arrays

Here we look from the state of the art in microphones, to what the future may bring with the enticing theoretical potential of microphone arrays built using MEMS technology.

Microphones: Part 10 - Mid-Side (M-S) Recording And Processing

M-S techniques provide useful sound-field positioning and a convenient way to check mono compatibility. We explain the hard science behind this often misunderstood technique.

Microphones: Part 9 - The Science Of Stereo Capture & Reproduction

Here we look at the science of using a matched pair of microphones positioned as a coincident pair to capture stereo sound images.

Microphones: Part 8 - Audio Vectorscopes

The audio vectorscope is an excellent tool for assuring quality in stereo sound production, because it makes the virtual sound image visible in the same way that a television vectorscope allows the color signals to be seen.

Microphones: Part 7 - Microphones For Stereophony

Once the basic requirements for reproducing sound were in place, the most significant next step was to reproduce to some extent the spatial attributes of sound. Stereophony, using two channels, was the first successful system.