Standards: Delivery - Embedding & Multiplexing Streams

Elementary streams carrying audio, video and metadata are combined into program streams, which are then multiplexed together into transport streams for broadcast. Here’s how it all works as well as what you need to plan for.

Stream Construction

Combining multiple synchronized audio visual streams is fundamental to broadcast and there are a number of standards to consider.

Audio visual content is constructed with several different kinds of media. Simplest of all with multiple elements would be a single video and audio stream synchronized together. Higher degrees of complexity require careful synchronization with accurate timing control.

On-air head-end systems multiplex several channels of uncompressed material into a transport stream once it is compressed into elementary streams. Live material is piped direct from the studio gallery and previously stored files are injected as if they were live feeds.

There are several different kinds of stream involved:

  • Elementary (sometimes called essence) streams.
  • Program streams.
  • Transport streams.

These are all described as streams which is short for bitstream. They are a sequence of bits, assembled into bytes and then combined into packets for transmission over the network. Elementary streams are combined in a nested fashion to create program streams which are combined and nested at a higher level to make transport streams for delivery.

Timing & Synchronization

The timing and synchronization of audio, video and optional metadata tracks is governed by the systems layer. This is fundamental. Timing and synchronization are relevant here:

  • At the lowest level, audio and video need to be precisely synchronized. Video is locked on a frame-by-frame basis. Audio is synchronized at the sample level. It must be correct to within a few milliseconds to preserve lip-sync.
  • The program stream is referenced to the start time so that text assets and sub-titles can be delivered at the correct frame times. The accuracy should be better than half a second.
  • When programs are broadcast on air, the time of broadcast is important so programs can be transmitted as described in the EPG. This is often just an approximation. Programs may be broadcast up to 10 minutes early or late and sometimes not at all.

The relevant standards organizations are:

  • IEEE Precision Time Protocols. Used as a basis for most other standards.
  • AES standards for audio carriage.
  • SMPTE standards for timing and professional media over IP.
  • ISO MPEG standards for audio/visual content.
  • DVB broadcast transmission specifications.
  • ETSI detailed specs for DVB use.

These standards are relevant to timing control:

StandardVintageDescription
AES32009Integral clock timing is embedded within the essence data.
AES112020Synchronization of digital audio equipment in studio operations.
AES67Uses IEEE 1588-2008 – PTPv2.
AES-R112009Methods for the measurement of audio-video synchronization error.
DVB-The Time Date Table (TDT) and Time offset table (TOT) are embedded inside the broadcast signal.
EN 300 468 v1.3.11997ETSI Specification for Service Information (SI) in DVB systems describes the TDT and TOT data embedded in the DVB transmissions.
IEEE 15882002Precision Time Protocol (PTPv1).
IEEE 15882008Precision Time Protocol (PTPv2).
IEEE 15882019Precision Time Protocol (PTPv2.1).
ISO 11172-11993MPEG-1 Systems layer provides timing and synchronization and describes how to multiplex elementary streams together to create a transport stream.
ISO 13818-12018MPEG-2 Systems layer adds to and enhances MPEG-1.
ISO 14496-12010MPEG-4 Systems layer adds to and enhances MPEG-2.
RP 2059-152022SMPTE Recommended practice for using the YANG Data Model for ST 2059-2 PTP Device Monitoring in Professional Broadcast Applications.
ST 12-12014SMPTE Timecode.
ST 3092012SMPTE timing – Date values.
ST 2059-12021Generation and Alignment of Interface Signals to the SMPTE Epoch.
ST 2059-22021SMPTE Profile for use of IEEE 1588-2008 – PTPv2. Read this together with part 1 to understand the entire concept.
ST 2059-102022Engineering guideline – Introduction to the ST 2059 Synchronization System.
ST 2110-102022SMPTE Professional Media over Managed IP Networks – System Timing and Definitions. This uses IEEE 1588-2008 – PTPv2.

Elementary Streams

MPEG describes elementary streams as the output of an encoder. They only carry a single type of media:

  • Audio.
  • Video.
  • Sub-title text.
  • URLs.
  • Metadata.
  • Control signals.

Elementary streams are a sequence of bits and the codec specifications only describe how they should be decoded. This allows encoder developers to improve the compression algorithms. Thus, coding efficiency improves over time without revising the standard. The bitstreams are sliced into packets for transmission or storage. The term ‘track’ is used in place of ‘stream’ when the content is stored in a file.

Multiple Audio Channels

Audio essence data can be carried in a variety of different ways. When multiple channels are required, they can be multiplexed together into a single stream or delivered independently, one stream at a time.

The choice of container format may constrain the possible configurations available.

NHK SuperHiVision requires 24 separate synchronized channels of audio. THX also requires a similar number of channels as does the highest specification of the Dolby Atmos system.

The Dolby Atmos mastering format is designed to handle up to 128 separate sound source streams plus metadata for mixing down into the target surround configuration.

Multiple Video Channels

Usually, only a single video channel is required. Emerging virtual-reality applications with stereoscopic-vision require two. DVDs support multiple video angles selectable at playback. Some orchestral concerts offer different views for each section of the orchestra. These must be perfectly synchronized with each other and the audio.

Program Streams

Embedding combines several streams of media to create a single higher-level stream. In its simplest form, packets of audio, video and timed-text are interleaved alternately. Audio requires much less space than video and intermittent text requires even less.  The interleaving may not always be in a regular pattern. Surround-sound audio will be even more complex as there are several more audio streams to include.

All of the streams need to arrive at the same time so the downstream devices can maintain synchronization to minimize buffering support.

This example shows how a program stream is created from audio, video and timed text elementary streams.

Synchronization is important and is described in a variety of delivery specifications. The DPP requires that sound and vision synchronization markers at the start of the program are within 5 milliseconds. Netflix mandates that separated audio files must match the duration of their companion video files to within 1 second.

Program streams are suitable for delivery via streaming services but need to be combined with several others for on-air broadcasting.

Constructing A Transport Stream

The individual elementary streams are combined to make a program stream. This represents a single TV channel in a broadcast scenario. Several program streams are combined with additional engineering and EPG metadata to construct a transport stream. Here is the nested structure:

Here is an example transport stream spanning 24 hours.

Over The Air (OTA) DVB Transport Streams

Terrestrial or Satellite broadcasts combine multiple separate program streams (channels) into a transport stream. Terrestrial broadcasters call this a multiplex while satellite broadcasters describe it as a transponder. It might also be described as a bouquet. This is based on the Digital Video Broadcasting (DVB) standards.

The available frequency bands limit the DTT broadcasts to a half-dozen multiplexes. This is sufficient to deliver more than a hundred channels. The commercial DSat (Digital Satellite) service carries more transponders and delivers approximately 300 channels. Any space that is too small to squeeze in another TV channel is used to carry radio broadcasts and data services.

Cable broadcasts were historically delivered using similar DVB standardized transports. They are rapidly migrating to a streamed over broadband IP service model.Some channels are only available for part of the day but the EPG hides this and displays them as separate channels even though they use the same slot in the multiplex.

Simple Multiplexing

Program streams that are combined into a transport stream have no knowledge of each other or how much bandwidth they are consuming. A simple multiplexing scheme bets on them not exceeding 100% of the bandwidth when combined. If several channels have a burst of activity simultaneously, the complexity increases and the available bandwidth is insufficient to carry the load. This leads to massive data loss and the picture breaks up. This is described as ‘blowing up’ or ‘busting the mux’.

Statistical Multiplexing

The available bandwidth can be used more effectively when several channels are transmitted together. Hardware compression in the head-end applies statistical multiplexing to avoid overloading the available capacity.

Statistical multiplexing allows individual channels to burst and use more than their average bandwidth at the expense of other channels. The Stat-mux adjusts the compression ratios to maintain a constant bitrate even when several channels are bursting simultaneously.

These are important caveats to bear in mind:

  • Statistical multiplexing for broadcasting can only be used with uncompressed (raw) source material. Consequently, this requires a slightly higher network capacity for delivery of content to the head-end.
  • Some channels are stored and re-broadcast one hour later on a +1 channel. The chances of simultaneous bursting are small.
  • If the same content is simulcast on several channels delivered in the same multiplex (for example BBC News and BBC 1), a statistical multiplexing approach is less helpful because all instances of the channel will burst at the same time.

Deploying Multiplexed Content

Timing and synchronization are critically important when creating program streams for delivery over IP or combining into broadcast transmissions.

Ensuring that broadcast bouquets of channels are deliverable within the available bandwidth requires careful management of the compression with a statistical multiplexor.

Broadcast content is gradually being displaced by streaming directly to viewers via an IP network. The streaming server is presented with a file which it then delivers to the client players. That content might be in a static file already prepared beforehand or a virtual file arriving as an external live feed.

The underlying technology solutions have been around for a long time. More recent innovation has improved the performance by alleviating bottlenecks and deploying these novel solutions to facilitate.

Supported by

You might also like...

SMPTE Education Launches Summer 2026 Lineup Of IP And ST 2110 Courses

Boasting two standalone courses, an intensive boot camp, and a hands-on practical lab, SMPTE Education has launched its summer 2026 Lineup of IP and ST 2110 Courses.

Standards: Video - Advanced Video Coding (AVC)

AVC remains one of the most widely deployed video codecs in the world, but navigating its profiles, levels and signaling mechanisms is far from straightforward.

Network Traffic Engineering: RIST & SRT - The Success Of ARQ Based Protocols

IP networks are inherently unreliable. We kick off this series on IP Network Traffic Engineering with a look at how RIST and SRT give broadcast engineers user-configurable control over the latency-versus-reliability trade-off for real-time media streaming.

Standards: Video - Standards For Video Coding

From 4K to 32K, the demand for ever-larger video formats is pushing codec technology to its limits. This guide surveys the landscape of video coding standards – from legacy MPEG formats to AI-driven neural network compression – to help navigate the choices sha…

Broadcast Standards 2026 – Video Coding

Video coding was developed to deliver video conferencing services over low-bandwidth modem connections, but modern demands for ever-larger video formats are pushing codec technology to its limits.