The venerable field of audio/visual (AV) packaging is undergoing a renaissance in the streaming age, driven by convergence between broadcast and broadband, demand for greater flexibility, and delivery in multiple versions over wider geographical areas requiring different languages and varying rights.
Other articles in this series:
This has been reflected by various surveys, including one from London-based Omdia predicting that revenue from video packaging will reach $4.55 billion in 2025, growing at a CAGR (Compound Annual Growth Rate) of 2% from now. Assuming that is real term growth after accounting for the current rampant inflation, it represents a considerable revival for a field that had looked moribund a few years ago.
All such predictions need a health warning because packaging touches a variety of component that may or may not have been accounted for in the research. Strictly, packaging is the process of preparing video for delivery in the protocol required for the given transmission medium, including all the video content, audio versions and captions, as well as relevant specifications such as resolutions and bit rates, such that it can all be unpacked and played back correctly by the receiving device. This applies whether the service is traditional linear broadcast, when it would be delivered as an MPEG Transport Stream, or over the internet via HTTP streaming protocols. In either case the content could be live or VOD.
But packaging is sometimes confused with encoding/transcoding and even the application of DRM with encryption for incorporation of rights management, which can depend on the device and geographical location of playback. Indeed, encoding is sometimes considered to be part of the packaging process, although it is usually separated for market sizing purposes. Under streaming, the two have come closer together with packaging sometimes performed as part of the transcoding process, adding to the confusion. In other cases, encryption and application of DRM are performed first with the resulting content then packaged afterwards in a separate step.
The trend though is towards performing them together, which works better with Dynamic and Just In Time (JIT) packaging. These are becoming increasingly prevalent as they are supported by more Online Video Platforms (OVPs) and CDNs (Content Delivery Networks). They are superseding traditional static packaging, which is performed at the time of ingest and worked well enough when all content was linear broadcast in a standard format that rarely needed revision. But now static packaging has become inefficient and imposes a constraint on service agility since new formats arise more frequently and would require re-transcoding and repackaging of the entire content catalogue to accommodate emerging platforms or new device types.
Just In Time (JIT) packaging reduces launch lead times by selecting appropriate stream packaging and DRM formats automatically.
Dynamic packaging emerged to enable new multiple formats to be supported on the fly, storing just a single source and then delivering content in a requested output format by generating it only when the request is received. This is essentially the same as JIT and can include adding encryption and DRM. It reduces costs because the heavyweight component, the content payload, is stored just once, while the multiple components such as different DRMs consume less space. Given that there are two major Adaptive Bit Rate Streaming (ABRS) formats or renditions, that is DASH and HLS, dynamic packaging cuts storage requirements in at least half by avoiding the duplication of generating and storing content for each one. Instead, each rendition is created on the fly.
The key to JIT packaging then is the ability to store multiple target renditions at different resolutions and bit rates in a state that is agnostic to the format, as fragmented MP4s (fMP4s). This contrasts with legacy packaging, where those renditions are stored in streaming protocol packages such as HLSA, DASH, or sometimes still Microsoft Smooth Streaming (MSS). Then when a device requests content from a given source, the JIT Packager generates the appropriate stream according to the format requested, assembling fMP4 fragments accordingly.
JIT Packaging not only saves resources and accelerates delivery of service upgrades or new features. It also underpins the more fundamental added value promised by streaming, including personalization for individuals and also to conform with varying regional regulations or rules, including content rights. It is also required in some form for advanced advertising with targeting and dynamic insertion.
Dynamic Advert Insertion (DAI) is an increasingly coveted source of new revenue for content owners, broadcasters and video service providers. It can either be done on the client or server side. There are significant advantages to Server-Side Ad Insertion (SSAI) in terms of overall service quality and superior ad engagement. It tends to improve the viewing experience by facilitating TV-style playback of ads inserted on the fly, in principle capable of being addressed to individual viewers, also with the ability to take account of factors such as season, time of day, weather and the surrounding content being viewed.
In essence, by implementing all the logic on the server side it is much easier to deliver a homogeneous, professional experience like viewers were accustomed to under traditional linear TV programming with spot ads. Under SSAI for example, video resolution and bitrate of the ads can be made to match those of the surrounding live or recorded content, so that there is no discernible disconcerting drop in quality from the programming to the ads, as there often is with client-side ad insertion.
JIT Packaging also opens the door to effective preparation of content for multiple markets, through creation of multiple language versions with different audio tracks or subtitles. The latter are usually packaged in fMP4 containers for selection on the fly by target devices, just as for the different bit rates and resolutions. For multiple languages, the initial step will often be first duplicate the video and create different pairs associating that same video source with audio tracks in each of the target languages. Then the multiple video files are recombined into one, just with multiple audio tracks for each of the languages, reducing the total amount of storage required, as well as network bandwidth. It means that just one video file needs to be delivered and the language can then be switched in on the fly. This is often accomplished with Fmpeg, a free and open-source software suite of libraries and programs for manipulating video, audio, and associated multimedia files or objects.
For many broadcasters in particular, another pressing issue has been convergence between traditional linear TV and delivery over the internet via broadband networks. For some years this was done by running the two in parallel using standards such as HbbTV, harmonising the two for separate delivery to set top boxes, connected TVs, or other internet-connected devices. But the long-term direction was towards migration of all delivery over IP, while still retaining the traditional over the air transmission media, that is satellite and digital terrestrial.
The DVB has unified streaming and broadcast packaging under DVB-I.
This led to development of the DVB-I initiative to facilitate native delivery of satellite and digital terrestrial over IP, while bringing internet services up to the quality standards of linear broadcast. An important milestone for this project was reached in February 2022 with announcement of the DVB-NIP specification, following a year’s intense technical work by at least 13 DVB member companies across the media delivery value chain. The key point of the new system designed for DVB-S2X or DVB-T2 broadcast bearers, is that for the first time it avoids the need for an underlying MPEG-2 Transport Stream, which had been the basis for DVB broadcast systems almost throughout their history.
Service providers of all kinds could for the first time use the same broadcast signal for both professional applications, such as CDN caching, and consumer applications such as DTH to native-IP TV sets, or just broadcasting to IP devices over an in-home gateway. This brings major cost and efficiency savings for operators and broadcasters through being able to consolidate all services over a single unified IP based headend, to reach all target devices.
Packaging is an important component of DVB-NIP. For this it reused other approved DVB standards, notably DVB-DASH and DVB-AVC. The latter embraces technical components for representation and synchronization of audio and video content in DVB services, including those delivered over both MPEG-2 Transport Streams and IP networks.
With the move towards JIT packaging and convergence around IP headends, the packaging field is in a state of flux but on course to support new business and revenue models in a world of hybrid delivery combining traditional media where appropriate for mass distribution of content with broadband for delivery of more personalized services.
You might also like...
Telstra helps customers span the globe. Offering a myriad of content delivery, production and playout services, Telstra Broadcast Services allows businesses, governments, communities and individuals to connect and expand their business across the globe in the most cost-effective way.
Video currently drives the most traffic on public networks, accounting for two-thirds of the overall global mobile data consumption. How long it can continue to grow is still up for debate.
We are told that in the future all cars will be electrically powered. It is therefore quite natural that a broadcaster should consider whether outside broadcast vehicles might follow suit.
IP has succeeded in abstracting away the media essence from the underlying transport stream, and in doing so is providing scalable and dynamic solutions that are facilitated through cloud technologies and software.
IP is an enabling technology that facilitates the use of data centers and cloud technology to power media workflows. The speed with which COTS (Commercial Off The Shelf) hardware can now process data means video and audio signals can be…