Standards: Video - Advanced Video Coding (AVC)
AVC remains one of the most widely deployed video codecs in the world, but navigating its profiles, levels and signaling mechanisms is far from straightforward.
About The AVC Standard
The MPEG-4 Part 10 standard is very large and complex with approximately 900 pages of densely concentrated detail.
The support for profiles and levels is fundamental to successfully deploying your content using the H.264 format. There have also been some important extensions (SVC, MVC and MFC) to the original codec design. These are included as Annexes to the main body of the standard and require a high degree of focus to read and interpret correctly.
Section 3 briefly describes the terminology and abbreviations used throughout the standard. Understanding them makes the rest of the standard much easier to comprehend.
The notational conventions in Section 5 are relevant if you want to understand the mathematical and logical concepts described later on. These will be most useful to codec developers.
The structure of the Network Abstraction Layer packets (NAL units) is described in Section 7. The coverage of the NAL unit payload includes descriptions of how the profile and level parameters are formatted. Read this in combination with the Annexes to glean the specific values and locations for the profile_idc, profile_ iop and level_idc bytes in the NAL unit.
Profiles are described in Annex A for the core AVC compression standard. More profiles are introduced in Annex F which describes Scalable Video Coding (SVC). Annexes G, H and I address various multi view coding techniques for stereoscopic and 3D viewing. The additional profiles needed to constrain them and signal the client-player are also described there.
Levels are addressed comprehensively in Sub-section A.3 of Annex A.
Annex B describes the Byte Stream Syntax as opposed to the bitstream syntax in Section 7. It also explains how a decoder can resynchronize itself to the incoming stream. The decoder frames the bitstream into 8-bit bytes to unpack the payloads in the NAL units.
Understand the decoding process with the Hypothetical Reference Decoder described in Annex C.
Supplemental Enhancement Information (SEI) is described in Annex D. This additional metadata describes the content in the stream. Decoders have some discretion in how they respond to this.
Annex E describes Video Usability Information (VUI) which parameterizes aspect-ratio, picture size, over-scanning, color gamut ranges and their associated transfer functions. The client-player uses this to present the video canvas correctly.
The rest of this chapter will focus on Profiles and Levels. This is an area of some complexity and low-level explanations of how it works are scarce and hard to find.
Profiles & Levels
Profiles and levels facilitate deployment of content delivery systems and receiving client-player designs by reducing the range of choices when interpreting how the standard works.
Choose the profile and level that best suits your needs. Encoders transmit the details to the client which interprets the bitstream accordingly.
Profile settings manage the encoding process and select appropriate sub-sets of the individual coding tools. This is a huge benefit and reduces the complexity of encoder configurations. The decoder has counterparts for each of these tools.
Levels are important in the receiving client-player and are concerned with the display size and color depth of the decoded images.
Do not confuse the container profiles defined by the MPEG4 systems layer with Part 10 video compression profiles. They are not the same thing.
Signaling The Profile & Level
The profile and level signaling mechanism in AVC has become very complex because the standard has been revised multiple times while retaining the necessary backwards compatibility with many millions of previously deployed devices.
The profile and level values are located near the start of a NAL unit (packet) payload. Unpack it carefully to reveal three bytes representing these properties:
- profile_idc
- profile_iop
- level_idc
The profile_iop value uses individual bits as flagging indicators. Conventional Boolean notation applies with the value 1 representing TRUE and the value 0 representing FALSE.
Byte 1 contains the profile_idc which identifies the foundation profile.
The same profile_idc value may be used to identify several different profiles because they are uniquely distinguished by appending the profile_iop value. For example, the same profile_idc is used for Baseline and Constrained Baseline profiles but IOP constraint bit-flag 1 determines which is selected.
Byte 2 is the Interoperability Profile (IOP) described as the profile_iop. It carries 5 constraining individual bit-flags which alter the behavior of the profile specified in the profile_idc. It also affects the behavior of the level_idc value. To unambiguously select a profile, Bytes 1 and 2 must be combined. The meaning of these individual constraint flags depends on the context. Refer to Section 7.4.2.1.1 for details and cross-references to the applicable annex descriptions.
Byte 3 the level_idc describes the level at which the chosen profile is operating so the client can reconstruct the images correctly.
Profile Categories
Many of the profiles are derived from the same common Baseline and High ancestors. This has implications when the behavior of level_idc values are examined. This diagram illustrates the inheritance:
H.264 profiles can also be grouped according to which part of the ISO standard they are described in:
| Category | Description |
|---|---|
| Core | The foundation set of profiles in H.264 define non-scalable 2D flat presentations. The player application may transform the video canvas that the images are being drawn onto. |
| Pro | Professional users, camera ingest and editing require additional profiles. |
| SVC | The Scalable Video Coding standard introduces more profiles. |
| MVC | Multi-view coding requires support for stereoscopic images in the player. These reduce the resolution of the two images so they can be accommodated within a single flat video raster. |
| MFC | Multi-resolution Frame-Compatible coding adds specialized profiles for full resolution stereoscopic imaging. |
| 3D | The 3D-AVC standard adds two more profiles for enhanced 3D support. |
Current List Of Profiles
These are the currently defined profiles for H.264. Gleaning the profile_idc and profile_iop values by carefully reading the standard is somewhat arduous as there is no corresponding summary table included.
The profile_idc value is shown in the IDC column. The optional constraint settings in the profile_iop are listed in the IOP column. All combinations of IDC and IOP are unique.
| Category | Profile name | IDC | IOP | Description |
|---|---|---|---|---|
| Core | Constrained Baseline | 66 | 1 | Useful for video conferencing and mobile applications. |
| Core | Baseline | 66 | - | Improves the robustness of the Constrained Baseline profile. The differences are subtle. |
| Core | Extended | 88 | - | Designed for streaming with additional capabilities to support stream switching. |
| Core | Main | 77 | - | Standard Definition TV over DVB transports. |
| Core | High | 100 | - | High Definition TV broadcast and storage. Adopted by Blu-ray discs and HDTV transmissions. |
| Core | Progressive High | 100 | 4 | Based on the High profile without interlace support. |
| Core | Constrained High | 100 | 4 & 5 | Based on the Progressive High profile. Removes support for Bi-Predictive slices. |
| Core | High 10 | 110 | - | Based on the high profile with increased 10-bit color detail. |
| Core | High 4:2:2 | 122 | - | Based on High 10 with added support for 4:2:2 chroma sampling. |
| Core | High 4:4:4 Predictive | 244 | - | Based on High 4:2:2 with full 4:4:4 chroma sampling extending up 14 bits. Adds lossless region coding and three separate color planes. |
| Pro | High 10 Intra | 110 | 3 | Based on High 10 constrained to all intra-frame coding. |
| Pro | High 4:2:2 Intra | 122 | 3 | Based on High 4:2:2 constrained to all intra-frame coding. |
| Pro | High 4:4:4 Intra | 244 | 3 | Based on High 4:4:4 constrained to all intra-frame coding. |
| Pro | CAVLC 4:4:4 Intra | 44 | - | Based on High 4:4:4 Intra with variable length coding. |
The standard defines profile_idc as an unsigned 8-bit integer value (0-255). Any profile_idc values not currently defined in the standard are reserved entirely for future use. They will be defined jointly by ITU-T and ISO/IEC.
The annexes at the end of ISO 14496 Part 10 are the authoritative source. Table 5 in IETF RFC 6184 is also helpful.
| Category | Profile name | IDC | IOP | Description |
|---|---|---|---|---|
| SVC | Scalable Baseline | 83 | - | Adds scalability to the Baseline profile. Useful for video conferencing, mobile and surveillance applications. |
| SVC | Scalable Constrained Baseline | 83 | 5 | Adds scalability to the Constrained Baseline profile. Suitable for real-time applications. |
| SVC | Scalable High | 86 | - | Adds scalability to the High profile. Suitable for broadcast and streaming applications. |
| SVC | Scalable Constrained High | 86 | 5 | Based on the Constrained High profile with added support for scalability. Used for real-time communications. |
| SVC | Scalable High Intra | 86 | 3 | Used for production applications that need high quality content with Intra support. |
| MVC | Stereo High | 128 | - | Based on the High profile with MVC extensions to encode two views. |
| MVC | Multi-view High | 118 | - | Based on the high profile. Used when more than two views are required. Lacks support for interlace. |
| MFC | MFC High | 134 | - | Enhanced resolution stereoscopic imaging based on the High profile. This packs two images into a single frame. |
| MFC | MFC Depth High | 135 | - | Adds depth maps for enhanced 3D rendering. |
| 3D | Multi-view Depth High | 138 | - | Adds depth map and video texture mapping for better 3D rendition. |
| 3D | Enhanced Multi-view Depth High | 139 | - | Multiple views with depth mapping support. |
Levels
The levels describe picture resolutions and frame-rates for the client-player to use when presenting the decoded output. Within any given bitrate, there is a trade-off between frame-rate and picture size. If you have a higher frame-rate, the pictures must be smaller. Decoding speed is also affected and so is the number of frames that can be buffered. The level limits defined in Table A.1 describe how the client must be able to support this.
The level_idc is an unsigned 8-bit integer value (0-255). Note that the standard describes the intermediate levels in Table A.1 as non-integer values. The intermediate levels describe alternative picture sizes and frame-rates within the available bandwidth and buffering capacity of each level.
This summary table describes just the main levels and resolutions. The standard mentions that some implementations may only use integer numbered levels and omit support for the intermediate ones:
| Level grouping | Description |
|---|---|
| 1 | Small pictures for older mobile devices. |
| 2 | Quarter SD frame size or low frame-rate SD. |
| 3 | SD and some 1280 HD formats. |
| 4 | 2K. |
| 5 | 4K. |
| 6 | 8K. |
There are some arcane rules for how the level_idc is combined with the profile_idc and the profile_iop constraint flags to determine the actual levels. These are described in Sub-section A.3 in Annex A.
The level limits are applied differently for the Low vs. High profiles. Level limits are described in Table A.1. To determine the indicated level from the level_idc value, you need to treat each group of profiles differently:
- Baseline, main and extended (low) profiles. The Baseline, Constrained Baseline, Main, and Extended profiles all share similar level limits based on constraint flagging in profile_iop and the profile_idc value (see Section A.3.1). Level 1b is non-numeric and uses constraint flag 3 to distinguish it from level 1.1. Both of them have the same level_idc value equal to 11.
- High profiles. The child profiles derived from the High profile similarly share some common behaviors which are described separately (see Section A.3.2). Level 1b is treated as a special case and has a level_idc value equal to 9.
After dealing with the special case for level 1b, the standard uses a fixed-point decimal representation where the integer value in level_idc is divided by 10 to yield the intermediate level number. For example, level 6.1 is represented by the level_idc having an integer value 61.
A decoder must support the maximum level limit values defined for a level and all lower levels beneath it.
| Level | level_idc value | Type |
|---|---|---|
| 1 | 10 | Main |
| 1b | 9 (See note below) | Intermediate |
| 1b | 11(See note below) | Intermediate |
| 1.1 | 11 | Intermediate |
| 1.2 | 12 | Intermediate |
| 1.3 | 13 | Intermediate |
| 2 | 20 | Main |
| 2.1 | 21 | Intermediate |
| 2.2 | 22 | Intermediate |
| 3 | 30 | Main |
| 3.1 | 31 | Intermediate |
| 3.2 | 32 | Intermediate |
| 4 | 40 | Main |
| 4.1 | 41 | Intermediate |
| 4.2 | 42 | Intermediate |
| 5 | 50 | Main |
| 5.1 | 51 | Intermediate |
| 5.2 | 52 | Intermediate |
| 6 | 60 | Main |
| 6.1 | 61 | Intermediate |
| 6.2 | 62 | Intermediate |
For Level 1b, the level_idc value should be set as follows:
- Use the value 9 for all child profiles based on the High profile.
- Use the value 11 with the constraint bit 3 set to 1 for child profiles based on the Baseline, Main and Extended profiles.
File Name Extensions
There are a variety of file extensions which describe containers that carry AVC video content. The 3GP files are intended for use with mobile devices. These would be somewhat rare in a broadcast environment but they may show up in user contributed content.
There are other file extensions for MPEG-4 audio files which are listed elsewhere.
| Ext | Description |
|---|---|
| .3gp | Based on MPEG-4 Part 12. Originally designed for early mobile (feature) phones. This is the preferred file extension. |
| .3gpp | Mixed media format for mobile phone use. |
| .3g2 | A second-generation file format for low bitrate content. |
| .3gpp2 | Mixed media format for mobile phone use. |
| .3gp2 | Mixed media format for mobile phone use. |
| .m4v | An MPEG-4 video file which may also contain AAC audio. |
| .mp4 | A general-purpose digital media container to carry videos, images, timed text and subtitles. Based on MPEG-4 part 12 and derived from the Apple QuickTime .mov file format. |
| .mp4v | An MPEG-4 video file used by Apple for iTunes. |
| .mov | QuickTime media platform container file. Typically contains a movie but could be an interactive multimedia presentation. |
Media Type Identifiers
When media is delivered across the Internet, the Media Type identifies the essence format. This is embedded in header lines that prefix an HTTP response. A web browser will extract these and build the necessary <video> tags to instantiate a player for the media.
Media types are also relevant for downloaded content as well.
The first part of the media type identifies the generic kind of essence. The second part (after the slash character) describes a specific variant.
Media type identifiers are sometimes described as a MIME type. This is an abbreviation for Multi-part Internet Mail Extension. MIME type values were first introduced to delimit the content of email body data so that attachments could be carried with the message.
These media type identifiers are relevant to AVC video content. They describe the container format. Audio content is implied as well since video is rarely presented without an accompanying soundtrack:
| File type | Content |
|---|---|
| video/mp4 | MPEG-4 file container with video content. |
| video/quicktime | Apple QuickTime movie containing video and other kinds of media. |
| video/x-matroska | A Matroška container carrying video content. |
An additional parameter can be added describe the specific video codec and its settings used for the essence carried in the container:
video/mp4; codecs=”avc1.4d002a”
This media type describes an MPEG-4 file containing AVC (H.264) video, coded with the Main Profile, at Level 4.2.
The rules governing the parameter values and their formatting are complex and different for each codec.
Applying AVC
Standards compliance does not guarantee interoperability. Make sure the profile and level you are encoding with is consistent with your target client-player.
For example, if company A makes a video codec that processes the picture size at high definition and company B makes a video player that expects to play content that is strictly standard definition these are incompatible even though they may both claim to be (and are) 100% standards compliant.
Bear in mind also that H.264 is not a lossless codec. It does have some features that make regions within a frame lossless but it cannot make the entire frame or sequence of frames entirely lossless.
Patent license fees are levied for commercial use of the AVC codec. Free content delivered via the Internet is license free however. This has spawned an open-source project to develop a patent free codec which is known as X264. You will most likely see this as a codec option in compression tools that have it installed.
These Appendix articles contain additional information you may find useful:
Supported by
You might also like...
Live Sports & Monetization: Public Service Broadcasters Maximizing Live Sports Opportunities
PSBs across the world are making the most of limited resources to enrich live sports coverage around ancillary content and platforms, and monetizing the resulting services. Here we focus on the content and coverage rather than technical issues around workflow…
Production–Delivery Convergence: Part 5 - Scaling The Future
The streaming industry is delivering richer formats, more personalization, and more immersive viewing experiences. There’s just one problem – how can the global delivery ecosystem support it?
Standards: Video - Standards For Video Coding
From 4K to 32K, the demand for ever-larger video formats is pushing codec technology to its limits. This guide surveys the landscape of video coding standards – from legacy MPEG formats to AI-driven neural network compression – to help navigate the choices sha…
Live Sports & Monetization: Public Service Broadcasters Thinking Outside The Box
Although premium sports rights may be slipping beyond reach for Public Service Broadcasters, the trend towards segmented rights owned increasingly by leagues or individual athletes offers further opportunities. There is growing scope for aggregating less costly packages, including highlights or…
Production–Delivery Convergence: Part 4 - When Viewer Performance Becomes An Ecosystem Problem
Viewer experience is not just about the user interface. Performance is also key, and even the strongest creative proposition struggles to retain viewers when the experience falls short. Consistent performance is no longer simply an engineering challenge; it is a…