Standards: Part 14 - About High Efficiency Video Coding (HEVC)

Here we look at the HEVC codec which is based on earlier work by MPEG on AVC and prior coding technologies. New techniques are employed to reduce the coded output size even further.


This article is part of our growing series on Standards.
There is an overview of all 26 articles in Part 1 -  An Introduction To Standards.


Video requires much higher resolution content than H.264 was designed for. Eventually a new codec is mandated. Early trials revealed that H.264 could encode 8K video better than expected. The higher resolution allowed many more redundant macro-blocks to be eliminated.

AVC was never optimized for 8K content and it was recognized that new ideas could halve the resulting output file size and improve performance. HEVC was designed to achieve compression ratios of 1000:1 and supports resolutions up to 16K.

Relevant ISO Standards

The HEVC standards are covered by MPEG-H which is published as ISO 23008. Only 5 parts of that standard are directly relevant to HEVC. The rest describe MPEG Media Transport (MMT) and 3D Audio. ISO 23002 part 7 is also relevant.

Standard Edition Description
ISO 23008-2 2023 High efficiency video coding (HEVC).
ISO 23008-5 2017 Reference software for HEVC.
ISO 23008-8 2018 Conformance specification for HEVC.
ISO 23008-14 2018 Guidance on conversion and coding of High Dynamic Range (HDR) imaging.
ISO 23008-15 2018 Signaling and display adaption for High Dynamic Range (HDR) imaging.
ISO 23002-7 2024 Versatile supplemental enhancement information messages for coded video bitstreams.

 

About ISO 23008 Part 2

The MPEG-H Part 2 standard is just as large and complex as the preceding AVC standard. The core of HEVC has extensions for Scalable, Multiview and Stereoscopic 3D viewing support.

The document structure is similar to AVC and the same section numbering scheme has been retained. Having studied the AVC standard, much of this content will be familiar. There are important and subtle differences to take note of throughout.

The Network Abstraction Layer packets (NAL units) described in Section 7 have a slightly different format. The payload carries Profile, Tier and Level parameters which are more logically defined than they were for AVC. These are described in the annexes as before.

The latest edition was published in 2023. A new amendment (in progress) adds more profile specifications and some new SEI messages.

HEVC vs AVC Performance Factors

HEVC differs in performance and processing workload compared with AVC encoders:

  • Frame rates - HEVC extends the maximum to 300 fps. This is significantly higher than the nominal 60 fps supported by AVC. Note that level 6.2 in AVC is rated at 120 fps as an exception.
  • Image quality - Performance evaluations verify that the image quality of HEVC is superior to AVC at any given bitrate. Noise levels, color spaces and dynamic range are all improved.
  • Image sizes - The operating range is optimal when dealing with 8K video. The latest revisions have increased this to 16K at extremely high bitrates.
  • Size of coded output - HEVC was targeted to halve the bitrate of AVC coded output. This is comfortably achieved with SD content and exceeded with higher definitions.
  • Encoding speed - HEVC needs to do significantly more work than AVC to reduce the output size.
  • Decoding performance - HEVC delivers content that is easier to decode. The increased coding workload is offset by improvements in the receiving client-player performance.
  • CPU utilization - Increasing the coding workload requires more CPU capacity. HEVC can be parallelized more efficiently and makes better use of multiple CPU cores to spread the workload.
  • Interlacing - There is no support for interlaced video in HEVC. Interlaced content can be managed by coding individual fields as separate images and passing SEI messages to the player.
HEVC Advancements

These are some of the improvements that HEVC offers when compared with AVC coding:

  • Video coding layer - Better de-blocking and edge preserving filters.
  • Color spaces - More alternatives supported.
  • Sample sizes - Allow greater ranges of colors.
  • Coding Tree Units - Can be sub-divided more flexibly than macro-blocks.
  • Lossless - HEVC supports truly lossless coding which improves on the region constrained partial lossless coding in AVC.
  • Still picture support - This facilitates HEIC image compression. This is useful in situations where the video is displaying a stationary image.
  • Screen content - Coding support for text and graphics images. This may provide better compression for traditional 2D animation as well.
  • SEI - Supplemental Enhancement Information supports the delivery of additional metadata signals.
Improved Video Coding Layer

The Video Coding Layer works conventionally by splitting the first picture into blocks. Intra-prediction is used to eliminate redundant identical or similar blocks within that first picture. Subsequent pictures are assimilated, and Inter-prediction looks for redundancy by comparing new pixel rectangles with the previously processed blocks.


Note the similarity in terminology naming. Intra and Inter are different in scope. Intra works within a single frame. Inter works across several frames.


The loop filers are applied after the block prediction phase is completed. There are two loop filters in HEVC:

  • DBF - The DeBlocking Filter reduces artefacts at the block boundaries. This is simpler than the equivalent filter in AVC. It is designed to be easily parallelized across multiple processors as a result.
  • SAO - The Sample Adaptive Offset reduces sample distortions. This helps process hard edges and reduces ringing artefacts. It also helps reduce contouring artefacts.
Coding Tree Units (CTU)

AVC divides pictures into macro-blocks. These are based on 16 x 16 pixels but can be sub-divided into smaller (4 x 4) rectangles.

HEVC uses Coding Tree Units up to 64 x 64 pixels in size to improve redundancy detection. Internally, they implement a Quadtree data structure to support variable block sizes. A Quadtree is a node based nested tree structure where every node has exactly four child nodes attached to it. The Quadtree can be recursively sub-divided and nested until the block size reduces to 4 x 4 pixels. Each layer must have exactly four leaf nodes stemming from the parent node. Each branch can support a further level if that helps resolve fine detail. Large areas of similar pixels do not require deeply nested trees.

Here is an illustration showing progressively deeper Quadtree nesting. The third level is only omitted to reduce the number of examples.

The source image may not be an exact integer multiple of the Coding Tree Unit (64 x 64) pixel size. Because the origin of the image is in the top left corner, the extreme right and bottom edges of the image may not result in completely filled blocks. The encoder can work around this by partially filling a square block. The decoder can crop the additional empty pixels when the content is unpacked and rendered into the display rectangle.

HEVC can group multiple CTU blocks into tiles and slices if this helps to find redundant duplicated parts of the image.

Slices can be used in various Intra and Inter modes or clone other slices from the same or other images and compute the residual deltas from them.

  • Tiles can be decoded independently of the rest of the picture.
  • I-slices behave like inter frames in AVC.
  • P-slices behave like predictive frames.
  • B-slices behave like bi-predictive frames.

Motion vector prediction support is improved by having 33 intra prediction modes. This is significantly better than the 8 implemented in AVC. Better support for the DC coefficient in the Discrete Cosine Transform (DCT) also helps reduce the output size.

More Diverse Color Spaces

Many alternative color spaces are supported in HEVC. These are optimized for various content sources and other standards:

  • NTSC video
  • PAL video
  • Generic film stock
  • Rec 601
  • Rec 709
  • Rec 2020
  • Rec 2100
  • SMPTE 170M
  • SMPTE 240M
  • sRGB
  • sYCC
  • xvYCC
  • XYZ
  • RGB
  • YCbCr
  • YCoCg
  • Externally defined color spaces
Pixel Sample Sizes

HEVC supports more pixel sample sizes. The 8-bit sampling is similar to AVC, but HEVC adds 10, 12, 14 and 16-bit alternatives depending on the profile selected. This will yield more vibrant color rendition and facilitates HDR support. Monochrome images are supported natively rather than desaturating color pictures to simulate them. This improves bitrate and coding efficiency because the redundant chroma samples are never coded.

Supplemental Enhancement Information (SEI)

Metadata is passed from the encoder to the receiving client-player in these SEI messages. The player uses them to apply post-processing to the decoded images.

These are a few of the SEI messages that apply to HEVC:

  • Remapping color spaces from one to another.
  • Hints for defining transfer functions to convert from SDR (Standard Dynamic Range) to HDR (High Dynamic Range) renditions.
  • Support for Hybrid Log Gamma (HLG) for open-source HDR applications.
  • Describe the color primaries, white-point and maximum-minimum luminance to define the mastering display color volume.
  • Time-code values relating to the content for archival purposes.
  • Details of the ambient lighting environment where the video was authored.
  • Support for 3D displays.

ISO standard 23002 part 7 (2024) describes Supplemental Enhancement Information (SEI) messages and Video Usability Information (VUI) parameters. It is cross-referenced from video coding standards to avoid repetition therein. This standard is particularly relevant to the Versatile Video Coding (VVC) video format.

Profiles

Profile selection determines how the encoder operates and selects a sub-set of the available coding tools. This affects the coding efficiency and size of the output bitstream. Profile and level signaling is less complex than H.264.

HEVC profiles are grouped in a similar way to the AVC profiles but can be gathered into different categories. The core profiles are extended in different ways according to the support you need:

Category Description
Core The foundation set of HEVC profiles.
Rext Format Range Extension profiles. These add different bit depths, monochrome versions and Intra coding formats.
High High throughput coding formats. Intended for situations where a very high bitrate is needed for professional content processing.
SCC Screen Content Coding Extensions to support imaging of text and graphics content.
SHVC Scalable Video Coding enhancements to the core HEVC coding specification described in Annex H of the standard.
MV-HEVC Stereoscopic and Multiview support described in Annex G of the standard.
3D-HEVC 3D imaging support described in Annex I of the standard.

 

The HEVC profile names are more descriptive than AVC profiles. Color sampling and bit depths are both clearly indicated.

Color sampling values range from 8 to 16-bits depending on the profile. Some profile names include an integer value to indicate the number of color-sampling bits. Assume that the sample size is 8-bits unless it is specified otherwise.

Chroma sampling formats are also implied by the profile name. Assume the default 4:2:0 format unless the profile describes a monochrome picture in which case use 4:0:0. Profiles operating in 4:2:2 and 4:4:4 mode will have the appropriate description embedded in their name. Profiles using 4:4:4 chroma sampling can also support delivery of 4:0:0, 4:2:0 and 4:2:2 content.

Category Edition Profile Name
Core 1 Main
Core 1 Main 10
Core 1 Main Still Picture
High 2 High Throughput 4:4:4 16 Intra
Rext 2 Main 10 Intra
Rext 2 Main 12
Rext 2 Main 12 Intra
Rext 2 Main 4:2:2 10
Rext 2 Main 4:2:2 10 Intra
Rext 2 Main 4:2:2 12
Rext 2 Main 4:2:2 12 Intra
Rext 2 Main 4:4:4
Rext 2 Main 4:4:4 10
Rext 2 Main 4:4:4 10 Intra
Rext 2 Main 4:4:4 12
Rext 2 Main 4:4:4 12 Intra
Rext 2 Main 4:4:4 16 Intra
Rext 2 Main 4:4:4 16 Still Picture
Rext 2 Main 4:4:4 Intra
Rext 2 Main 4:4:4 Still Picture
Rext 2 Main Intra
Rext 2 Monochrome
Rext 2 Monochrome 12
Rext 2 Monochrome 12 Intra
Rext 2 Monochrome 16
Rext 2 Monochrome 16 Intra
MV-HEVC 2 Multiview Main
SHVC 2 Scalable Main
SHVC 2 Scalable Main 10
3D-HEVC 3 3D Main
High 4 High Throughput 4:4:4
High 4 High Throughput 4:4:4 10
High 4 High Throughput 4:4:4 14
SHVC 4 Scalable Main 4:4:4
SHVC 4 Scalable Monochrome
SHVC 4 Scalable Monochrome 12
SHVC 4 Scalable Monochrome 16
High/SCC 4 Screen-Extended High Throughput 4:4:4
High/SCC 4 Screen-Extended High Throughput 4:4:4 10
High/SCC 4 Screen-Extended High Throughput 4:4:4 14
SCC 4 Screen-Extended Main
SCC 4 Screen-Extended Main 10
SCC 4 Screen-Extended Main 4:4:4
SCC 4 Screen-Extended Main 4:4:4 10
Core 5 Main 10 Still Picture
Rext 5 Monochrome 10
MV-HEVC 9-Amd Multiview Main 10
MV-HEVC 9-Amd Multiview Monochrome
MV-HEVC 9-Amd Multiview Monochrome 10
MV-HEVC 9-Amd Multiview Monochrome 12

 


Note that profiles can simultaneously belong to the High Throughput and Screen Content Coding Extension categories.


Tiers

HEVC introduces two Tiers of operation which are related to the Profiles and the corresponding Levels:

  • Main tier - Designed for most applications, this tier is available across all levels but constrains the maximum bitrate to a much lower level than the High tier. Optimized for SD picture sizes and smaller. It is suitable for most consumer applications.
  • High tier - Cannot be used for levels 1 to 3. This is designed for HD and higher picture sizes with levels 4 to 7 and where the application demands better performance.
Levels

Levels describe the picture sizes in the receiving client-player and are similar to AVC levels. An additional level (7) increases the scope to support 16K images. The level signaling is much simpler in HEVC than it was in AVC.

These are the main level groups and picture sizes:

Level Grouping Description
1 Small pictures for older mobile devices.
2 Quarter SD frame size or low frame-rate SD.
3 SD and some 1280 HD formats.
4 2K.
5 4K.
6 8K.
7 16K.

 

The encoder trades off frame-rates and picture sizes to remain within a given bitrate defined by the level. This affects the decoding speed and picture buffering limits. Maximum values for picture size and frame rates for the different levels are listed here as examples. Consult the standard for more detail:

Level Picture Size FPS
1 176 × 144 15.0
2 352 × 288 30.0
2.1 640 × 360 30.0
3 960 × 540 30.0
3.1 1280 × 720 33.7
4 2048 × 1080 30.0
4.1 2048 × 1080 60.0
5 4096 × 2160 30.0
5.1 4096 × 2160 60.0
5.2 4096 × 2160 120.
6 8192 × 4320 30.0
6.1 8192 × 4320 60.0
6.2 8192 × 4320 120.0
6.3 12288 x 6480 60.0
7 16384 x 8640 34.0
7.1 16384 x 8640 60.4
7.2 16384 x 8640 120.8

 

Levels 6.3 to 7.2 were defined in a later edition of the standard. They introduce the 12K and 16K sizes for use with future display technologies beyond the current 8K formats.

Container Files

Most of the containers that were compatible with AVC can be used for storing HEVC content with a couple of exceptions. They are summarized here:

Container Type File ext AVC HEVC
Material Exchange Format mxf Yes Yes
MPEG Program Stream mpg, mpeg Yes Yes
MPEG Transport Stream ts Yes Yes
Third Generation Partnership Project (3GPP) 3gp Yes Yes
Matroška file format mkv Yes Yes
MPEG-4 Part 14 mp4 Yes Yes
QuickTime File Format (QTFF) mov Yes Yes
Advanced Systems Format (ASF) asf Yes Yes
Audio Video Interleave avi Yes Yes
MPEG-2 Transport Stream used on Blu-ray discs m2ts Yes No
Enhanced Video Object files for HD DVD discs evo Yes No
Flash MP4 video file based on ISOBMFF f4v Yes No

 

Market Penetration

AVC is a widely used format and is very popular now. HEVC is a format for the future. As technology platforms advance, HEVC might achieve the same market penetration eventually. It must offer sufficient advantages to offset the additional cost of coding systems and patent licensing fees to succeed.

The latest versions of all the major web browsers support HEVC playback in the HTML5 <video> tag. This facilitates the deployment of web-based video players.

The 3D-HEVC support has been adopted by the Apple Vision Pro headset. That significantly enhances the reputation of HEVC.

Patent Issues

Patent licensing did significant damage to the prospects of MPEG-4 interactive content 20 years ago. History is repeating itself with HEVC which is struggling to gain traction, mainly due to the costs of patent licenses.

AVC has a single patent licensing pool but there are four with HEVC and some patent holders are going it alone as well. Patent licensing fees for HEVC are significantly higher than they were for AVC.

In the fullness of time, all patents will naturally expire. MPEG-2 is essentially patent free other than in Malaysia. New codec designs can use that technology to create royalty free alternatives. MPEG4 part 2 is also now patent free. AVC is expected to be patent free by 2030. Since HEVC is partly based on earlier codecs, it is likely that some of those expiring patents are relevant. It remains to be seen whether that affects the patent licensing fees.

Large tech companies, where the bulk of any potential patent revenues might have come from, established the Alliance for Open Media which has developed a royalty free alternative to HEVC. The AV1 codec is open-source and freely available to anyone to use. It is significant that Apple has built AV1 decoding tools into its proprietary M3 Apple Silicon CPU chip designs.

Conclusion

Caveats regarding profile and level compatibility apply to HEVC as they did to AVC. The encoder and client-player must both support the same configuration.

HEVC has gained considerable credibility by being adopted by Apple for use in the Vision Pro VR headset. This can only be good for the future of HEVC.

We may still need yet another cycle of codec innovation. HEVC is good for 8K and is scoped to support 16K at the upper end of its designed performance range. Research work and prototypes are already in hand for 32K video systems. We thought that 8K might be too extreme for consumers to have at home, but displays are now affordable. Will 16 and 32K be feasible? If they are, we need even better compression technologies. There are no displays of that size yet, but there are several prototype camera designs in the pipeline.

Part of a series supported by

You might also like...

The Resolution Revolution

We can now capture video in much higher resolutions than we can transmit, distribute and display. But should we?

Microphones: Part 3 - Human Auditory System

To get the best out of a microphone it is important to understand how it differs from the human ear.

HDR Picture Fundamentals: Camera Technology

Understanding the terminology and technical theory of camera sensors & lenses is a key element of specifying systems to meet the consumer desire for High Dynamic Range.

IP Security For Broadcasters: Part 2 - The Problem To Be Solved

By assuming that IP must be made secure, we run the risk of missing a more fundamental question that is often overlooked: why is IP so insecure?

Standards: Part 22 - Inside AIFF Files

Compared with other popular standards in use, AIFF is ancient. The core functionality was stabilized over 30 years ago and remains unchanged.