Standards: Part 10 - Embedding And Multiplexing Streams

Audio visual content is constructed with several different media types. Simplest of all would be a single video and audio stream synchronized together. Additional complexity is commonplace. This requires careful synchronization with accurate timing control.

This article is part of our growing series on Broadcast Standards.
The first 26 articles are now available in Broadcast Standards – The Book.

There are several different kinds of streams involved:

Elementary (sometimes called essence) streams.
Program streams.
Transport streams.

These are described as streams which is short for bitstream. They are a sequence of bits, assembled into bytes and then combined into packets for transmission over the network. Elementary streams are combined in a nested fashion to create program streams which are combined to make transport streams for delivery.

Timing & Synchronization

The timing and synchronization of audio, video and optional metadata tracks is governed by the systems layer. This is fundamental. Timing and synchronization are relevant here:
At the lowest level, audio and video need to be precisely synchronized. Video is locked on a frame-by-frame basis. Audio is synchronized at the sample level. It must be correct to within a few milliseconds to preserve lip-sync.
The program stream is referenced to the start time so that text assets and sub-titles can be delivered at the correct frame times. The accuracy should be better than half a second.
When programs are broadcast on-air, the time of broadcast is important so programs can be transmitted as described in the EPG. This is often just an approximation. Programs may be broadcast up to 10 minutes early or late and sometimes not at all.

The relevant standards organizations are:

IEEE Precision Time Protocols. Used as a basis for most other standards.
AES standards for audio carriage.
SMPTE standards for timing and professional media over IP.
ISO MPEG standards for audio/visual content.
DVB broadcast transmission specifications.
ETSI detailed specs for DVB use.

These are the relevant specifications for timing:

Standard	Description
IEEE1588-2002	Precision Time Protocol (PTPv1).
IEEE1588-2008	Precision Time Protocol (PTPv2).
AES3	Integral clock timing is embedded within the essence data.
AES11-2020	Synchronization of digital audio equipment in studio operations.
AES-R11-2009	Methods for the measurement of audio-video synchronization error.
AES67	Uses IEEE 1588-2008 - PTPv2.
ST 12-1	SMPTE Timecode.
ST 309	SMPTE timing - Date values.
ST 2059-1	Generation and Alignment of Interface Signals to the SMPTE Epoch.
ST 2059-2	SMPTE Profile for use of IEEE 1588-2008 - PTPv2. Read this together with part 1 to understand the entire concept.
ST 2059-10	Engineering guideline - Introduction to the ST 2059 Synchronization System.
RP 2059-15	Recommended practice for using the YANG Data Model for ST 2059-2 PTP Device Monitoring in Professional Broadcast Applications.
ST 2110-10	SMPTE Professional Media over Managed IP Networks: System Timing and Definitions. This uses IEEE 1588-2008 - PTPv2.
ISO 11172-1	MPEG-1 Systems layer provides timing and synchronization and describes how to multiplex elementary streams together to create a transport stream.
ISO 13818-1	MPEG-2 Systems layer adds to and enhances MPEG-1.
ISO 14496-1	MPEG-4 Systems layer adds to and enhances MPEG-2.
DVB	The Time Date Table (TDT) and Time Offset Table (TOT) are embedded inside the broadcast signal.
EN 300 468 v1.3.1	ETSI Specification for Service Information (SI) in DVB systems describes the TDT and TOT data embedded in the DVB transmissions.

Elementary Streams

MPEG describes elementary streams as the output of an encoder. They only carry a single type of media:

Audio
Video
Sub-title text
URLs
Metadata
Control signals

Elementary streams are a sequence of bits. The codec specifications only describe how they should be decoded. This allows encoder developers to improve the compression algorithms. Thus, coding efficiency improves over time without revising the standard. The bitstreams are sliced into packets for transmission or storage. The term 'track' is used in place of 'stream' when the content is stored in a file.

Multiple Audio Channels

Audio complexity increases as more channels are introduced. Wikipedia describes 21 different surround-sound arrangements. The most common formats are:

Format	Details	Notation	Channels
Mono	A single channel on its own.	1.0	1
Stereo	Separate left and right sound-mixes.	2.0	2
Quadraphonic	A legacy format creating a surround effect. Sounds are placed left and right, front and back.	4.0	4
Surround-sound	Sounds are placed left, right and middle at the front with additional left and right rear speakers. Plus, a non-directional single low-frequency channel.	5.1	6
Early Dolby Atmos™️	Enhances the 5.1 arrangement to add four more channels at ceiling height. Two on each side.	5.1.4	10
Dolby Atmos™️ (high performance)	An advanced configuration places loudspeakers around the audience and in the ceiling above them. There are 11 full range speakers, one low-frequency woofer and 8 high-level speakers.	11.1.8	20
NHK Super Hi-Vision	The sound-system for the 8K demonstrations in 2018 used multiple 7.1 surround systems (floor, wall and ceiling) with extra low-frequency support.	22.2	24

Microsoft define an informal specification for channel names that identifies 18 discrete sound sources within a multi-channel environment. It is described as KSAUDIO_CHANNEL_CONFIG and is also useful for non-Windows applications. Other naming conventions may apply in different environments. These individual sound sources are mixed down via a matrix when fewer channels are required. Mixing tools for spatially positioning events within the soundscape become more sophisticated as the number of channels increases.

Multiple Video Channels

Usually, only a single video channel is required. Emerging virtual-reality applications with stereoscopic-vision require two. DVDs support multiple video angles selectable at playback. Some orchestral concerts offer different views for each section of the orchestra. These must be perfectly synchronized with each other and the audio.

Program Streams

Embedding combines several streams of media to create a single higher-level stream. In its simplest form, packets of audio, video and timed-text are interleaved alternately. Audio requires much less space than video and intermittent text requires even less. The interleaving may not always be in a regular pattern. Surround-sound audio will be even more complex as there are several more audio streams to include.

All of the streams need to arrive at the same time so the downstream devices can maintain synchronization to minimize buffering support.

This example shows how a program stream is created from audio, video and timed text elementary streams.

Synchronization is important and is described in delivery specifications:

The DPP requires that sound and vision synchronization markers at the start of the program are within 5 milliseconds.
Netflix mandates that separated audio files must match the duration of their companion video files to within 1 second.

Program streams are suitable for delivery via streaming services but need to be combined with several others for on-air broadcasting.

SDI Audio Embedding

SMPTE ST 2110-2x standards accommodate SDI conforming to the ST 292M standard as a source format. SDI has the capacity to carry up to 16 channels of audio depending on the sample size and frequency.

The SDI format is derived from classic analogue TV services, having a space where the video is blanked. Each horizontal line has a space at the start for ancillary data (HANC). Lines are reserved at the top and bottom of the frame for more ancillary data (VANC).

Digital audio is stored in the HANC space and is extracted for conversion to MPEG or ST 2110 compatible formats.

Constructing A Transport Stream

The individual elementary streams are combined to make a program stream. This represents a single TV channel in a broadcast scenario. Several program streams are combined with additional engineering and EPG metadata to construct a transport stream.

Here is the nested structure:

Over The Air (OTA) DVB Transport Streams

Terrestrial or Satellite broadcasts combine multiple separate program streams (channels) into a transport stream. Terrestrial broadcasters call this a multiplex while satellite broadcasters describe it as a transponder. It might also be described as a bouquet. This is based on the Digital Video Broadcasting (DVB) standards.

The available frequency bands limit the DTT broadcasts to a half-dozen multiplexes. This is sufficient to deliver more than a hundred channels. The commercial DSat service carries more transponders and delivers approximately 300 channels. Any space that is too small to squeeze in another TV channel is used to carry radio broadcasts and data services.

Cable broadcasts were historically delivered using similar DVB standardized transports. They are rapidly migrating to a streamed over broadband IP service model.

Here is an example transport stream spanning 24 hours:

Some channels are only available for part of the day but the EPG hides this and displays them as separate channels even though they use the same slot in the multiplex.

Simple Multiplexing

Program streams that are combined into a transport stream have no knowledge of each other or how much bandwidth they are consuming. A simple multiplexing scheme bets on them not exceeding 100% of the bandwidth when combined. If several channels have a burst of activity simultaneously, the complexity increases and the available bandwidth is insufficient to carry the load. This leads to massive data loss and the picture breaks up. This is described as 'Blowing up or busting the mux'.

Statistical Multiplexing

The available bandwidth can be used more effectively when several channels are transmitted together. Hardware compression in the head-end applies statistical multiplexing to avoid overloading the available capacity.

Statistical multiplexing allows individual channels to burst and use more than their average bandwidth at the expense of other channels. The Stat-mux adjusts the compression ratios to maintain a constant bitrate even when several channels are bursting simultaneously.

These are important caveats to bear in mind:

Statistical multiplexing for broadcasting can only be used with uncompressed (raw) source material. Consequently, this requires a slightly higher network capacity for delivery of content to the head-end.
Some channels are stored and re-broadcast one hour later. The chances of simultaneous bursting are small.
When the same content is simulcast on several channels (for example BBC News), a statistical multiplexing approach is very helpful.

Conclusion

Timing and synchronization are critically important when creating program streams for delivery over IP or combining into broadcast transmissions.

Ensuring that broadcast bouquets of channels are deliverable within the available bandwidth requires careful management of the compression with a statistical multiplexor.

These Appendix articles contain additional information you may find useful:

Part of a series supported by

You might also like...

Broadcast Standards – Cloud Compute Infrastructure – Part 1

Welcome to Part 1 of Broadcast Standards – Cloud Compute Infrastructure. This collection of articles is the first in a new series which expands on the enormously popular ‘Broadcast Standards - The Book’ by Cliff Wootton. Over the coming months a series of Th…

IP Monitoring & Diagnostics With Command Line Tools: Part 3 - Monitoring Your Remote Systems

Monitoring what is happening in a remote system depends on being able to ask for something to be checked and having the results reported back to you. There are many ways to do this. This article looks at some simple…

Live Sports Production: Sports Production Network Infrastructure

A discussion of production network infrastructure and where the industry is in the evolutionary journey from SDI to IP with senior system architects within three of the most respected organizations in broadcast.

Monitoring & Compliance In Broadcast: Part 2 - The Converged Delivery Ecosystem

‘Monitoring & Compliance In Broadcast’ explores how exemplary content production and delivery standards are maintained and legal obligations are met. The series includes four Themed Content Collections, each of which tackles a different area of the media supply chain. Part 2 con…

Building Software Defined Infrastructure: Part 3 - Monitoring Dynamic Resource

Welcome to Part 3 of Building Software Defined Infrastructure. This multi-part content series from Tony Orme explores the microservices based IT technologies that are driving the next phase of transition from hardware to software based broadcast systems. This series is essential…