Standards: Part 13 - Exploring MPEG4-Part 10 - H.264/AVC

The H.264/AVC codec has been very successful. Here we dig deeper into how profiles and levels work to facilitate deployment of delivery systems and receiving client-player designs.


This article is part of our growing series on Standards.
There is an overview of all 26 articles in Part 1 -  An Introduction To Standards.


In 2004, it was uncertain whether H.264/AVC or VC1 would become dominant. VC1 was based on a popular Microsoft Windows Media format offered to SMPTE for ratification. Eventually, H.264 did become the codec of choice for a wide variety of applications. The successors (SVC, HEVC & LCEVC) must offer significant advantages to gain similar traction.

About The Standard

The MPEG-4 Part 10 standard is very large and complex with approximately 900 pages of densely concentrated detail.

The support for profiles and levels is fundamental to successfully deploying your content using the H.264 format. There have also been some important extensions (SVC, MVC and MFC) to the original codec design. These are included as Annexes to the main body of the standard and require a high degree of focus to interpret correctly.

Section 3 briefly describes the terminology and abbreviations used throughout the standard. Understanding these makes the rest of the standard much easier to comprehend.

The notational conventions in Section 5 are relevant if you want to understand the mathematical and logical concepts described later on. These will be most useful to codec developers.

The structure of the Network Abstraction Layer packets (NAL units) is described in Section 7. The coverage of the NAL unit payload includes descriptions of how the profile and level parameters are formatted. Read this in combination with the Annexes to glean the specific values and locations for the profile_idc, profile_iop and level_idc bytes in the NAL unit.

Profiles are described in Annex A for the core AVC compression standard. More profiles are introduced in Annex F which describes Scalable Video Coding (SVC). Annexes G, H and address various multi-view coding techniques for stereoscopic and 3D viewing. The additional profiles needed to constrain them and signal the client-player are also described there.

Levels are addressed comprehensively in Sub-section A.3 of Annex A.

Annex B describes the Byte Stream Syntax as opposed to the bitstream syntax in Section 7. It also explains how a decoder can resynchronize itself to the incoming stream. The decoder frames the bitstream into 8-bit bytes to unpack the payloads in the NAL units.

Understand the decoding process with the Hypothetical Reference Decoder described in Annex C.

Supplemental Enhancement Information (SEI) is described in Annex D. This additional metadata describes the content in the stream. Decoders have some discretion in how they respond to this.

Annex E describes Video Usability Information (VUI) which parameterizes aspect-ratio, picture size, over-scanning, color gamut ranges and their associated transfer functions. The client-player uses this to present the video canvas correctly.

The rest of this article will focus on Profiles and Levels. This is an area of some complexity and low-level explanations of how it works are scarce and hard to find.

Profiles & Levels

Choose the profile and level that best suits your needs. Encoders transmit the details to the client which interprets the bitstream accordingly.

Profiles manage the encoding process and select appropriate sub-sets of the individual coding tools. This is a huge benefit and reduces the complexity of encoder configurations. The decoder has counterparts for each of these tools.

Levels are important in the receiving client-player and are concerned with the display size and color depth of the decoded images.


Do not confuse the container profiles defined by the MPEG4 systems layer with Part 10 video compression profiles. They are not the same thing.


Signaling The Profile & Level

The profile and level signaling mechanism has become very complex because the standard has been revised multiple times while retaining the necessary backwards compatibility with many millions of previously deployed devices.

The profile and level values are located near the start of a NAL unit (packet) payload. Unpack it carefully to reveal three bytes representing these properties:

profile_idc
profile_iop
level_idc


The profile_iop value uses individual bits as flagging indicators. Conventional Boolean notation applies with the value 1 representing TRUE and the value 0 representing FALSE.


Byte 1 contains the profile_idc which identifies the foundation profile. The same profile_idc value may be used to identify several different profiles because they are uniquely distinguished by appending the profile_iop value. For example, the same profile_idc is used for Baseline and Constrained Baseline profiles but IOP constraint bit-flag 1 determines which is selected.

Byte 2 is the Interoperability Profile (IOP) described as the profile_iop. It carries 5 constraining individual bit-flags which alter the behavior of the profile specified in the profile_idc. It also affects the behavior of the level_idc value. To unambiguously select a profile, Bytes 1 and 2 must be combined. The meaning of these individual constraint flags depends on the context. Refer to Section 7.4.2.1.1 for details and cross-references to the applicable annex descriptions.

Byte 3 The level_idc describes the level at which the chosen profile is operating so the client can reconstruct the images correctly.

Profile Categories

Many of the profiles are derived from the same common Baseline and High ancestors. This has implications when the behavior of level_idc values are examined. This diagram illustrates the inheritance:

H.264 profiles can also be grouped according to which part of the ISO standard they are described in:

Category Description
Core The foundation set of profiles in H.264 define non-scalable 2D flat presentations. The player application may transform the video canvas that the images are being drawn onto.
Pro Professional users, camera ingest and editing require additional profiles.
SVC The Scalable Video Coding standard introduces more profiles.
MVC Multi-view coding requires support for stereoscopic images in the player. These reduce the resolution of the two images so they can be accommodated within a single flat video raster.
MFC Multi-resolution Frame-Compatible coding adds specialized profiles for full resolution stereoscopic imaging.
3D The 3D-AVC standard adds two more profiles for enhanced 3D support.

 

Current List Of Profiles

These are the currently defined profiles for H.264. Gleaning the profile_idc and profile_iop values by carefully reading the standard is somewhat arduous as there is no corresponding summary table included.

The profile_idc value is shown in the IDC column. The optional constraint settings in the profile_iop are listed in the IOP column.  All combinations of IDC and IOP are unique.

Category Profile name IDC IOP Description
Core Constrained Baseline 66 1 Useful for video conferencing and mobile applications.
Core Baseline 66 - Improves the robustness of the Constrained Baseline profile. The differences are subtle.
Core Extended 88 - Designed for streaming with additional capabilities to support stream switching.
Core Main 77 - Standard Definition TV over DVB transports.
Core High 100 - High Definition TV broadcast and storage. Adopted by Blu-ray discs and HDTV transmissions.
Core Progressive High 100 4 Based on the High profile without interlace support.
Core Constrained High 100 4 & 5 Based on the Progressive High profile. Removes support for Bi-Predictive slices.
Core High 10 110 - Based on the high profile with increased 10-bit color detail.
Core High 4:2:2 122 - Based on High 10 with added support for 4:2:2 chroma sampling.
Core High 4:4:4 Predictive 244 - Based on High 4:2:2 with full 4:4:4 chroma sampling extending up 14 bits. Adds lossless region coding and three separate color planes.
Pro High 10 Intra 110 3 Based on High 10 constrained to all intra-frame coding.
Pro High 4:2:2 Intra 122 3 Based on High 4:2:2 constrained to all intra-frame coding.
Pro High 4:4:4 Intra 244 3 Based on High 4:4:4 constrained to all intra-frame coding.
Pro CAVLC 4:4:4 Intra 44 - Based on High 4:4:4 Intra with variable length coding.
SVC Scalable Baseline 83 - Adds scalability to the Baseline profile. Useful for video conferencing, mobile and surveillance applications.
SVC Scalable Constrained Baseline 83 5 Adds scalability to the Constrained Baseline profile. Suitable for Real-Time applications.
SVC Scalable High 86 - Adds scalability to the High profile. Suitable for broadcast and streaming applications.
SVC Scalable Constrained High 86 5 Based on the Constrained High profile with added support for scalability. Used for real-time communications.
SVC Scalable High Intra 86 3 Used for production applications that need high quality content with Intra support.
MVC Stereo High 128 - Based on the High profile with MVC extensions to encode two views.
MVC Multi-view High 118 - Based on the high profile. Used when more than two views are required. Lacks support for interlace.
MFC MFC High 134 - Enhanced resolution stereoscopic imaging based on the High profile. This packs two images into a single frame.
MFC MFC Depth High 135 - Adds depth maps for enhanced 3D rendering.
3D Multi-view Depth High 138 - Adds depth map and video texture mapping for better 3D rendition.
3D Enhanced Multi-view Depth High 139 - Multiple views with depth mapping support.

 

The standard defines profile_idc as an unsigned 8-bit integer value (0-255). Any profile_idc values not currently defined in the standard are reserved entirely for future use. They will be defined jointly by ITU-T and ISO/IEC.


The annexes at the end of ISO 14496 Part 10 are the authoritative source. Table 5 in IETF RFC 6184 is also helpful.


Levels

The levels describe picture resolutions and frame-rates for the client-player to use when presenting the decoded output. Within any given bitrate, there is a trade-off between frame-rate and picture size. If you have a higher frame-rate, the pictures must be smaller. Decoding speed is also affected and so is the number of frames that can be buffered. The level limits defined in Table A.1 describe how the client must be able to support this.

The level_idc is an unsigned 8-bit integer value (0-255). However, the standard describes the intermediate levels in Table A.1 as non-integer values. The intermediate levels describe alternative picture sizes and frame-rates within the available bandwidth and buffering capacity of each level.

Here is a summary list showing just the main levels and resolutions. The standard mentions that some implementations may only use these integer numbered levels and omit support for the intermediate ones:

Level grouping Description
1 Small pictures for older mobile devices.
2 Quarter SD frame size or low frame-rate SD.
3 SD and some 1280 HD formats.
4 2K.
5 4K.
6 8K.

 

There are some arcane rules for how the level_idc is combined with the profile_idc and the profile_iop constraint flags to determine the actual levels. These are described in Sub-section A.3 in Annex A.

The level limits are applied differently for the Low vs. High profiles. Level limits are described in Table A.1. To determine the indicated level from the level_idc value, you need to treat each group of profiles differently.

Baseline, main and extended (low) profiles. The Baseline, Constrained Baseline, Main, and Extended profiles all share similar level limits based on constraint flagging in profile_iop and the profile_idc value (see Section A.3.1). Level 1b is non-numeric and uses constraint flag 3 to distinguish it from level 1.1. Both of them have the same level_idc value equal to 11.

High profiles. The child profiles derived from the High profile similarly share some common behaviors which are described separately (see Section A.3.2). Level 1b is treated as a special case and has a level_idc value equal to 9.

After dealing with the special case for level 1b, the standard uses a fixed-point decimal representation where the integer value in level_idc is divided by 10 to yield the intermediate level number. For example level 6.1 is represented by the level_idc having an integer value 61.

Level level_idc value Type
1 10 Main
1b 11 - with constraint bit 3 set to 1 for child profiles based on the baseline, main and extended profiles. Intermediate
1b 9 for all child profiles based on the High profile. Intermediate
1.1 11 Intermediate
1.2 12 Intermediate
1.3 13 Intermediate
2 20 Main
2.1 21 Intermediate
2.2 22 Intermediate
3 30 Main
3.1 31 Intermediate
3.2 32 Intermediate
4 40 Main
4.1 41 Intermediate
4.2 42 Intermediate
5 50 Main
5.1 51 Intermediate
5.2 52 Intermediate
6 60 Main
6.1 61 Intermediate
6.2 62 Intermediate

 


A decoder must support the maximum level limit values defined for a level and all lower levels beneath it.


Conclusion

Standards compliance does not guarantee interoperability. Make sure the profile and level you are encoding with is consistent with your target client-player.

For example, if company A makes a video codec that processes the picture size at high definition and company B makes a video player that expects to play content that is strictly standard definition these are incompatible even though they may both claim to be (and are) 100% standards compliant.

Bear in mind also that H.264 is not a lossless codec. It does have some features that make regions within a frame lossless but it cannot make the entire frame or sequence of frames entirely lossless.

Part of a series supported by

You might also like...

Microphones: Part 2 - Design Principles

Successful microphones have been built working on a number of different principles. Those ideas will be looked at here.

Expanding Display Capabilities And The Quest For HDR & WCG

Broadcast image production is intrinsically linked to consumer displays and their capacity to reproduce High Dynamic Range and a Wide Color Gamut.

Standards: Part 20 - ST 2110-4x Metadata Standards

Our series continues with Metadata. It is the glue that connects all your media assets to each other and steers your workflow. You cannot find content in the library or manage your creative processes without it. Metadata can also control…

Delivering Intelligent Multicast Networks - Part 2

The second half of our exploration of how bandwidth aware infrastructure can improve data throughput, reduce latency and reduce the risk of congestion in IP networks.

If It Ain’t Broke Still Fix It: Part 1 - Reliability

IP is an enabling technology which provides access to the massive compute and GPU resource available both on- and off-prem. However, the old broadcasting adage: if it ain’t broke don’t fix it, is no longer relevant, and potentially hig…