Standards: Part 10 - Embedding And Multiplexing Streams
Audio visual content is constructed with several different media types. Simplest of all would be a single video and audio stream synchronized together. Additional complexity is commonplace. This requires careful synchronization with accurate timing control.
This article is part of our growing series on Standards.
There is an overview of all 26 articles in Part 1 - An Introduction To Standards.
There are several different kinds of streams involved:
- Elementary (sometimes called essence) streams.
- Program streams.
- Transport streams.
These are described as streams which is short for bitstream. They are a sequence of bits, assembled into bytes and then combined into packets for transmission over the network. Elementary streams are combined in a nested fashion to create program streams which are combined to make transport streams for delivery.
Timing & Synchronization
- The timing and synchronization of audio, video and optional metadata tracks is governed by the systems layer. This is fundamental. Timing and synchronization are relevant here:
- At the lowest level, audio and video need to be precisely synchronized. Video is locked on a frame-by-frame basis. Audio is synchronized at the sample level. It must be correct to within a few milliseconds to preserve lip-sync.
- The program stream is referenced to the start time so that text assets and sub-titles can be delivered at the correct frame times. The accuracy should be better than half a second.
- When programs are broadcast on-air, the time of broadcast is important so programs can be transmitted as described in the EPG. This is often just an approximation. Programs may be broadcast up to 10 minutes early or late and sometimes not at all.
The relevant standards organizations are:
- IEEE Precision Time Protocols. Used as a basis for most other standards.
- AES standards for audio carriage.
- SMPTE standards for timing and professional media over IP.
- ISO MPEG standards for audio/visual content.
- DVB broadcast transmission specifications.
- ETSI detailed specs for DVB use.
These are the relevant specifications for timing:
Standard | Description |
---|---|
IEEE1588-2002 | Precision Time Protocol (PTPv1). |
IEEE1588-2008 | Precision Time Protocol (PTPv2). |
AES3 | Integral clock timing is embedded within the essence data. |
AES11-2020 | Synchronization of digital audio equipment in studio operations. |
AES-R11-2009 | Methods for the measurement of audio-video synchronization error. |
AES67 | Uses IEEE 1588-2008 - PTPv2. |
ST 12-1 | SMPTE Timecode. |
ST 309 | SMPTE timing - Date values. |
ST 2059-1 | Generation and Alignment of Interface Signals to the SMPTE Epoch. |
ST 2059-2 | SMPTE Profile for use of IEEE 1588-2008 - PTPv2. Read this together with part 1 to understand the entire concept. |
ST 2059-10 | Engineering guideline - Introduction to the ST 2059 Synchronization System. |
RP 2059-15 | Recommended practice for using the YANG Data Model for ST 2059-2 PTP Device Monitoring in Professional Broadcast Applications. |
ST 2110-10 | SMPTE Professional Media over Managed IP Networks: System Timing and Definitions. This uses IEEE 1588-2008 - PTPv2. |
ISO 11172-1 | MPEG-1 Systems layer provides timing and synchronization and describes how to multiplex elementary streams together to create a transport stream. |
ISO 13818-1 | MPEG-2 Systems layer adds to and enhances MPEG-1. |
ISO 14496-1 | MPEG-4 Systems layer adds to and enhances MPEG-2. |
DVB | The Time Date Table (TDT) and Time Offset Table (TOT) are embedded inside the broadcast signal. |
EN 300 468 v1.3.1 | ETSI Specification for Service Information (SI) in DVB systems describes the TDT and TOT data embedded in the DVB transmissions. |
Elementary Streams
MPEG describes elementary streams as the output of an encoder. They only carry a single type of media:
- Audio
- Video
- Sub-title text
- URLs
- Metadata
- Control signals
Elementary streams are a sequence of bits. The codec specifications only describe how they should be decoded. This allows encoder developers to improve the compression algorithms. Thus, coding efficiency improves over time without revising the standard. The bitstreams are sliced into packets for transmission or storage. The term 'track' is used in place of 'stream' when the content is stored in a file.
Multiple Audio Channels
Audio complexity increases as more channels are introduced. Wikipedia describes 21 different surround-sound arrangements. The most common formats are:
Format | Details | Notation | Channels |
---|---|---|---|
Mono | A single channel on its own. | 1.0 | 1 |
Stereo | Separate left and right sound-mixes. | 2.0 | 2 |
Quadraphonic | A legacy format creating a surround effect. Sounds are placed left and right, front and back. | 4.0 | 4 |
Surround-sound | Sounds are placed left, right and middle at the front with additional left and right rear speakers. Plus, a non-directional single low-frequency channel. | 5.1 | 6 |
Early Dolby Atmos™️ | Enhances the 5.1 arrangement to add four more channels at ceiling height. Two on each side. | 5.1.4 | 10 |
Dolby Atmos™️ (high performance) | An advanced configuration places loudspeakers around the audience and in the ceiling above them. There are 11 full range speakers, one low-frequency woofer and 8 high-level speakers. | 11.1.8 | 20 |
NHK Super Hi-Vision | The sound-system for the 8K demonstrations in 2018 used multiple 7.1 surround systems (floor, wall and ceiling) with extra low-frequency support. | 22.2 | 24 |
Microsoft define an informal specification for channel names that identifies 18 discrete sound sources within a multi-channel environment. It is described as KSAUDIO_CHANNEL_CONFIG and is also useful for non-Windows applications. Other naming conventions may apply in different environments. These individual sound sources are mixed down via a matrix when fewer channels are required. Mixing tools for spatially positioning events within the soundscape become more sophisticated as the number of channels increases.
Multiple Video Channels
Usually, only a single video channel is required. Emerging virtual-reality applications with stereoscopic-vision require two. DVDs support multiple video angles selectable at playback. Some orchestral concerts offer different views for each section of the orchestra. These must be perfectly synchronized with each other and the audio.
Program Streams
Embedding combines several streams of media to create a single higher-level stream. In its simplest form, packets of audio, video and timed-text are interleaved alternately. Audio requires much less space than video and intermittent text requires even less. The interleaving may not always be in a regular pattern. Surround-sound audio will be even more complex as there are several more audio streams to include.
All of the streams need to arrive at the same time so the downstream devices can maintain synchronization to minimize buffering support.
This example shows how a program stream is created from audio, video and timed text elementary streams.
Synchronization is important and is described in delivery specifications:
- The DPP requires that sound and vision synchronization markers at the start of the program are within 5 milliseconds.
- Netflix mandates that separated audio files must match the duration of their companion video files to within 1 second.
Program streams are suitable for delivery via streaming services but need to be combined with several others for on-air broadcasting.
SDI Audio Embedding
SMPTE ST 2110-2x standards accommodate SDI conforming to the ST 292M standard as a source format. SDI has the capacity to carry up to 16 channels of audio depending on the sample size and frequency.
The SDI format is derived from classic analogue TV services, having a space where the video is blanked. Each horizontal line has a space at the start for ancillary data (HANC). Lines are reserved at the top and bottom of the frame for more ancillary data (VANC).
Digital audio is stored in the HANC space and is extracted for conversion to MPEG or ST 2110 compatible formats.
Constructing A Transport Stream
The individual elementary streams are combined to make a program stream. This represents a single TV channel in a broadcast scenario. Several program streams are combined with additional engineering and EPG metadata to construct a transport stream.
Here is the nested structure:
Over The Air (OTA) DVB Transport Streams
Terrestrial or Satellite broadcasts combine multiple separate program streams (channels) into a transport stream. Terrestrial broadcasters call this a multiplex while satellite broadcasters describe it as a transponder. It might also be described as a bouquet. This is based on the Digital Video Broadcasting (DVB) standards.
The available frequency bands limit the DTT broadcasts to a half-dozen multiplexes. This is sufficient to deliver more than a hundred channels. The commercial DSat service carries more transponders and delivers approximately 300 channels. Any space that is too small to squeeze in another TV channel is used to carry radio broadcasts and data services.
Cable broadcasts were historically delivered using similar DVB standardized transports. They are rapidly migrating to a streamed over broadband IP service model.
Here is an example transport stream spanning 24 hours:
Some channels are only available for part of the day but the EPG hides this and displays them as separate channels even though they use the same slot in the multiplex.
Simple Multiplexing
Program streams that are combined into a transport stream have no knowledge of each other or how much bandwidth they are consuming. A simple multiplexing scheme bets on them not exceeding 100% of the bandwidth when combined. If several channels have a burst of activity simultaneously, the complexity increases and the available bandwidth is insufficient to carry the load. This leads to massive data loss and the picture breaks up. This is described as 'Blowing up or busting the mux'.
Statistical Multiplexing
The available bandwidth can be used more effectively when several channels are transmitted together. Hardware compression in the head-end applies statistical multiplexing to avoid overloading the available capacity.
Statistical multiplexing allows individual channels to burst and use more than their average bandwidth at the expense of other channels. The Stat-mux adjusts the compression ratios to maintain a constant bitrate even when several channels are bursting simultaneously.
These are important caveats to bear in mind:
- Statistical multiplexing for broadcasting can only be used with uncompressed (raw) source material. Consequently, this requires a slightly higher network capacity for delivery of content to the head-end.
- Some channels are stored and re-broadcast one hour later. The chances of simultaneous bursting are small.
- When the same content is simulcast on several channels (for example BBC News), a statistical multiplexing approach is very helpful.
Conclusion
Timing and synchronization are critically important when creating program streams for delivery over IP or combining into broadcast transmissions.
Ensuring that broadcast bouquets of channels are deliverable within the available bandwidth requires careful management of the compression with a statistical multiplexor.
These Appendix articles contain additional information you may find useful:
Part of a series supported by
You might also like...
Brazil Adopts ATSC 3.0 For NextGen TV Physical Layer
The decision by Brazil’s SBTVD Forum to recommend ATSC 3.0 as the physical layer of its TV 3.0 standard after field testing is a particular blow to Japan’s ISDB-T, because that was the incumbent digital terrestrial platform in the country. C…
Designing IP Broadcast Systems: System Monitoring
Monitoring is at the core of any broadcast facility, but as IP continues to play a more important role, the need to progress beyond video and audio signal monitoring is becoming increasingly important.
Broadcasting Innovations At Paris 2024 Olympic Games
France Télévisions was the standout video service performer at the 2024 Paris Summer Olympics, with a collection of technical deployments that secured the EBU’s Excellence in Media Award for innovations enabled by application of cloud-based IP production.
Standards: Part 18 - High Efficiency And Other Advanced Audio Codecs
Our series on Standards moves on to discussion of advancements in AAC coding, alternative coders for special case scenarios, and their management within a consistent framework.
HDR & WCG For Broadcast - Expanding Acquisition Capabilities With HDR & WCG
HDR & WCG do present new requirements for vision engineers, but the fundamental principles described here remain familiar and easily manageable.