Standards: Part 13 - Exploring MPEG4-Part 10 - H.264/AVC
The H.264/AVC codec has been very successful. Here we dig deeper into how profiles and levels work to facilitate deployment of delivery systems and receiving client-player designs.
This article is part of our growing series on Standards.
There is an overview of all 26 articles in Part 1 - An Introduction To Standards.
In 2004, it was uncertain whether H.264/AVC or VC1 would become dominant. VC1 was based on a popular Microsoft Windows Media format offered to SMPTE for ratification. Eventually, H.264 did become the codec of choice for a wide variety of applications. The successors (SVC, HEVC & LCEVC) must offer significant advantages to gain similar traction.
About The Standard
The MPEG-4 Part 10 standard is very large and complex with approximately 900 pages of densely concentrated detail.
The support for profiles and levels is fundamental to successfully deploying your content using the H.264 format. There have also been some important extensions (SVC, MVC and MFC) to the original codec design. These are included as Annexes to the main body of the standard and require a high degree of focus to interpret correctly.
Section 3 briefly describes the terminology and abbreviations used throughout the standard. Understanding these makes the rest of the standard much easier to comprehend.
The notational conventions in Section 5 are relevant if you want to understand the mathematical and logical concepts described later on. These will be most useful to codec developers.
The structure of the Network Abstraction Layer packets (NAL units) is described in Section 7. The coverage of the NAL unit payload includes descriptions of how the profile and level parameters are formatted. Read this in combination with the Annexes to glean the specific values and locations for the profile_idc, profile_iop and level_idc bytes in the NAL unit.
Profiles are described in Annex A for the core AVC compression standard. More profiles are introduced in Annex F which describes Scalable Video Coding (SVC). Annexes G, H and I address various multi-view coding techniques for stereoscopic and 3D viewing. The additional profiles needed to constrain them and signal the client-player are also described there.
Levels are addressed comprehensively in Sub-section A.3 of Annex A.
Annex B describes the Byte Stream Syntax as opposed to the bitstream syntax in Section 7. It also explains how a decoder can resynchronize itself to the incoming stream. The decoder frames the bitstream into 8-bit bytes to unpack the payloads in the NAL units.
Understand the decoding process with the Hypothetical Reference Decoder described in Annex C.
Supplemental Enhancement Information (SEI) is described in Annex D. This additional metadata describes the content in the stream. Decoders have some discretion in how they respond to this.
Annex E describes Video Usability Information (VUI) which parameterizes aspect-ratio, picture size, over-scanning, color gamut ranges and their associated transfer functions. The client-player uses this to present the video canvas correctly.
The rest of this article will focus on Profiles and Levels. This is an area of some complexity and low-level explanations of how it works are scarce and hard to find.
Profiles & Levels
Choose the profile and level that best suits your needs. Encoders transmit the details to the client which interprets the bitstream accordingly.
Profiles manage the encoding process and select appropriate sub-sets of the individual coding tools. This is a huge benefit and reduces the complexity of encoder configurations. The decoder has counterparts for each of these tools.
Levels are important in the receiving client-player and are concerned with the display size and color depth of the decoded images.
Do not confuse the container profiles defined by the MPEG4 systems layer with Part 10 video compression profiles. They are not the same thing.
Signaling The Profile & Level
The profile and level signaling mechanism has become very complex because the standard has been revised multiple times while retaining the necessary backwards compatibility with many millions of previously deployed devices.
The profile and level values are located near the start of a NAL unit (packet) payload. Unpack it carefully to reveal three bytes representing these properties:
• profile_idc
• profile_iop
• level_idc
The profile_iop value uses individual bits as flagging indicators. Conventional Boolean notation applies with the value 1 representing TRUE and the value 0 representing FALSE.
Byte 1 contains the profile_idc which identifies the foundation profile. The same profile_idc value may be used to identify several different profiles because they are uniquely distinguished by appending the profile_iop value. For example, the same profile_idc is used for Baseline and Constrained Baseline profiles but IOP constraint bit-flag 1 determines which is selected.
Byte 2 is the Interoperability Profile (IOP) described as the profile_iop. It carries 5 constraining individual bit-flags which alter the behavior of the profile specified in the profile_idc. It also affects the behavior of the level_idc value. To unambiguously select a profile, Bytes 1 and 2 must be combined. The meaning of these individual constraint flags depends on the context. Refer to Section 7.4.2.1.1 for details and cross-references to the applicable annex descriptions.
Byte 3 The level_idc describes the level at which the chosen profile is operating so the client can reconstruct the images correctly.
Profile Categories
Many of the profiles are derived from the same common Baseline and High ancestors. This has implications when the behavior of level_idc values are examined. This diagram illustrates the inheritance:
H.264 profiles can also be grouped according to which part of the ISO standard they are described in:
Category | Description |
---|---|
Core | The foundation set of profiles in H.264 define non-scalable 2D flat presentations. The player application may transform the video canvas that the images are being drawn onto. |
Pro | Professional users, camera ingest and editing require additional profiles. |
SVC | The Scalable Video Coding standard introduces more profiles. |
MVC | Multi-view coding requires support for stereoscopic images in the player. These reduce the resolution of the two images so they can be accommodated within a single flat video raster. |
MFC | Multi-resolution Frame-Compatible coding adds specialized profiles for full resolution stereoscopic imaging. |
3D | The 3D-AVC standard adds two more profiles for enhanced 3D support. |
Current List Of Profiles
These are the currently defined profiles for H.264. Gleaning the profile_idc and profile_iop values by carefully reading the standard is somewhat arduous as there is no corresponding summary table included.
The profile_idc value is shown in the IDC column. The optional constraint settings in the profile_iop are listed in the IOP column. All combinations of IDC and IOP are unique.
Category | Profile name | IDC | IOP | Description |
---|---|---|---|---|
Core | Constrained Baseline | 66 | 1 | Useful for video conferencing and mobile applications. |
Core | Baseline | 66 | - | Improves the robustness of the Constrained Baseline profile. The differences are subtle. |
Core | Extended | 88 | - | Designed for streaming with additional capabilities to support stream switching. |
Core | Main | 77 | - | Standard Definition TV over DVB transports. |
Core | High | 100 | - | High Definition TV broadcast and storage. Adopted by Blu-ray discs and HDTV transmissions. |
Core | Progressive High | 100 | 4 | Based on the High profile without interlace support. |
Core | Constrained High | 100 | 4 & 5 | Based on the Progressive High profile. Removes support for Bi-Predictive slices. |
Core | High 10 | 110 | - | Based on the high profile with increased 10-bit color detail. |
Core | High 4:2:2 | 122 | - | Based on High 10 with added support for 4:2:2 chroma sampling. |
Core | High 4:4:4 Predictive | 244 | - | Based on High 4:2:2 with full 4:4:4 chroma sampling extending up 14 bits. Adds lossless region coding and three separate color planes. |
Pro | High 10 Intra | 110 | 3 | Based on High 10 constrained to all intra-frame coding. |
Pro | High 4:2:2 Intra | 122 | 3 | Based on High 4:2:2 constrained to all intra-frame coding. |
Pro | High 4:4:4 Intra | 244 | 3 | Based on High 4:4:4 constrained to all intra-frame coding. |
Pro | CAVLC 4:4:4 Intra | 44 | - | Based on High 4:4:4 Intra with variable length coding. |
SVC | Scalable Baseline | 83 | - | Adds scalability to the Baseline profile. Useful for video conferencing, mobile and surveillance applications. |
SVC | Scalable Constrained Baseline | 83 | 5 | Adds scalability to the Constrained Baseline profile. Suitable for Real-Time applications. |
SVC | Scalable High | 86 | - | Adds scalability to the High profile. Suitable for broadcast and streaming applications. |
SVC | Scalable Constrained High | 86 | 5 | Based on the Constrained High profile with added support for scalability. Used for real-time communications. |
SVC | Scalable High Intra | 86 | 3 | Used for production applications that need high quality content with Intra support. |
MVC | Stereo High | 128 | - | Based on the High profile with MVC extensions to encode two views. |
MVC | Multi-view High | 118 | - | Based on the high profile. Used when more than two views are required. Lacks support for interlace. |
MFC | MFC High | 134 | - | Enhanced resolution stereoscopic imaging based on the High profile. This packs two images into a single frame. |
MFC | MFC Depth High | 135 | - | Adds depth maps for enhanced 3D rendering. |
3D | Multi-view Depth High | 138 | - | Adds depth map and video texture mapping for better 3D rendition. |
3D | Enhanced Multi-view Depth High | 139 | - | Multiple views with depth mapping support. |
The standard defines profile_idc as an unsigned 8-bit integer value (0-255). Any profile_idc values not currently defined in the standard are reserved entirely for future use. They will be defined jointly by ITU-T and ISO/IEC.
The annexes at the end of ISO 14496 Part 10 are the authoritative source. Table 5 in IETF RFC 6184 is also helpful.
Levels
The levels describe picture resolutions and frame-rates for the client-player to use when presenting the decoded output. Within any given bitrate, there is a trade-off between frame-rate and picture size. If you have a higher frame-rate, the pictures must be smaller. Decoding speed is also affected and so is the number of frames that can be buffered. The level limits defined in Table A.1 describe how the client must be able to support this.
The level_idc is an unsigned 8-bit integer value (0-255). However, the standard describes the intermediate levels in Table A.1 as non-integer values. The intermediate levels describe alternative picture sizes and frame-rates within the available bandwidth and buffering capacity of each level.
Here is a summary list showing just the main levels and resolutions. The standard mentions that some implementations may only use these integer numbered levels and omit support for the intermediate ones:
Level grouping | Description |
---|---|
1 | Small pictures for older mobile devices. |
2 | Quarter SD frame size or low frame-rate SD. |
3 | SD and some 1280 HD formats. |
4 | 2K. |
5 | 4K. |
6 | 8K. |
There are some arcane rules for how the level_idc is combined with the profile_idc and the profile_iop constraint flags to determine the actual levels. These are described in Sub-section A.3 in Annex A.
The level limits are applied differently for the Low vs. High profiles. Level limits are described in Table A.1. To determine the indicated level from the level_idc value, you need to treat each group of profiles differently.
• Baseline, main and extended (low) profiles. The Baseline, Constrained Baseline, Main, and Extended profiles all share similar level limits based on constraint flagging in profile_iop and the profile_idc value (see Section A.3.1). Level 1b is non-numeric and uses constraint flag 3 to distinguish it from level 1.1. Both of them have the same level_idc value equal to 11.
• High profiles. The child profiles derived from the High profile similarly share some common behaviors which are described separately (see Section A.3.2). Level 1b is treated as a special case and has a level_idc value equal to 9.
After dealing with the special case for level 1b, the standard uses a fixed-point decimal representation where the integer value in level_idc is divided by 10 to yield the intermediate level number. For example level 6.1 is represented by the level_idc having an integer value 61.
Level | level_idc value | Type |
---|---|---|
1 | 10 | Main |
1b | 11 - with constraint bit 3 set to 1 for child profiles based on the baseline, main and extended profiles. | Intermediate |
1b | 9 for all child profiles based on the High profile. | Intermediate |
1.1 | 11 | Intermediate |
1.2 | 12 | Intermediate |
1.3 | 13 | Intermediate |
2 | 20 | Main |
2.1 | 21 | Intermediate |
2.2 | 22 | Intermediate |
3 | 30 | Main |
3.1 | 31 | Intermediate |
3.2 | 32 | Intermediate |
4 | 40 | Main |
4.1 | 41 | Intermediate |
4.2 | 42 | Intermediate |
5 | 50 | Main |
5.1 | 51 | Intermediate |
5.2 | 52 | Intermediate |
6 | 60 | Main |
6.1 | 61 | Intermediate |
6.2 | 62 | Intermediate |
A decoder must support the maximum level limit values defined for a level and all lower levels beneath it.
Conclusion
Standards compliance does not guarantee interoperability. Make sure the profile and level you are encoding with is consistent with your target client-player.
For example, if company A makes a video codec that processes the picture size at high definition and company B makes a video player that expects to play content that is strictly standard definition these are incompatible even though they may both claim to be (and are) 100% standards compliant.
Bear in mind also that H.264 is not a lossless codec. It does have some features that make regions within a frame lossless but it cannot make the entire frame or sequence of frames entirely lossless.
These Appendix articles contain additional information you may find useful:
Part of a series supported by
You might also like...
Microphones: Part 2 - Design Principles
Successful microphones have been built working on a number of different principles. Those ideas will be looked at here.
Expanding Display Capabilities And The Quest For HDR & WCG
Broadcast image production is intrinsically linked to consumer displays and their capacity to reproduce High Dynamic Range and a Wide Color Gamut.
Standards: Part 20 - ST 2110-4x Metadata Standards
Our series continues with Metadata. It is the glue that connects all your media assets to each other and steers your workflow. You cannot find content in the library or manage your creative processes without it. Metadata can also control…
Delivering Intelligent Multicast Networks - Part 2
The second half of our exploration of how bandwidth aware infrastructure can improve data throughput, reduce latency and reduce the risk of congestion in IP networks.
If It Ain’t Broke Still Fix It: Part 1 - Reliability
IP is an enabling technology which provides access to the massive compute and GPU resource available both on- and off-prem. However, the old broadcasting adage: if it ain’t broke don’t fix it, is no longer relevant, and potentially hig…