Standards: Audio - High Efficiency Audio Codecs (HE-AAC)

HE-AAC builds on the foundations of AAC to deliver near CD-quality audio at bitrates as low as 32 kbps, making it the codec of choice for mobile TV, digital radio and low-bandwidth streaming. This guide unpacks the key technologies behind its efficiency gains.

High Efficiency AAC Audio Coding

High Efficiency AAC (HE-AAC) refines the original AAC coding techniques but remains compatible with MPEG-2 Part 7. It can deliver almost uncompressed CD-quality sound at 32 Kbps.

HE-AAC is known by a variety of other names:

AliasCanonical name
AAC+HE-AAC v1
aacPlusHE-AAC v1
aacPlus v2HE-AAC v2
eAAC+HE-AAC v2

These are some important features that HE-AAC introduces:

  • SpectralBand Replication (SBR).
  • Parametric Stereo (SSC).
  • Perceptual Noise Substitution (PNS).
  • Long Term Predictor (LTP).
  • Low Delay (AAC-LD).
  • MPEG-4 Scalable to Lossless (SLS).
  • AAC Scalable Sample Rate (SSR).
  • Structured Audio (SA).
  • Text To Speech (TTSI).
  • Error Resilience (ER).

Structured Audio and text to speech are extremely compact because they describe a sound that the player renders entirely in the receiving client. This can deliver performance as low as 100bps.

There are many historical versions. Each additional profile will add more variants:

VersionDescription
AAC (Original)Described in ISO 13818-7:1997.
AAC (Version 1)Described in ISO 14496-3:1999.
AAC (Version 2)Described in the ISO 14496-3:2000 revision.
AAC (Current)Described in ISO 14496-3:2009.
AAC+ (Version 2)Version 2 of aacPlus is described in ETSI TS 102 005:2010.
HE-AAC v1 (AAC+) ProfileDescribed in the ISO 14496-3:2001 revision. Combines AAC LC with Spectral Band Replication (SBR).
HE-AAC v2 (aacPlus) ProfileDescribed in the ISO 14496-3:2005 Revision. Adds Parametric Stereo (PS) to the version 1 features to achieve lower bitrates. Sometimes described as eAAC+ (Extended AAC plus).
xHE-AACFraunhofer introduced Loudness Control and adaptive streaming around 2016 (See MPEG-DASH). Well supported by many players including iOS and Android.
Extended HE-AACISO 23003-3:2020 adds USAC coding to HE-AAC version 2, extending the tool set.

AAC Tools & Technologies

The AAC coding tools were reorganized in MPEG-4 to make them more flexible. New tools were added and older tools were refactored into separate items. They are now described as Audio Objects, each one having a specific identity and purpose. Some of them are containers for descriptive information. This is an additional layer of abstraction facilitating profile definitions.

These are the basic AAC audio-objects and the technologies that are used inside them. Refer to the MPEG-4 Part 3 numbered sub-parts in the ISO MPEG-4 part 3 official standard for functional descriptions of these tools and how they are mapped into the profiles. The sub-part references are all in the MPEG-4 Part 3 standard:

TerminologyDescription
AAC MainBased on AAC LC.
AAC LC The Low Complexity Audio Object combines the MPEG-2 Part 7 Low Complexity profile (LC) with Perceptual Noise Substitution (PNS). See sub-part 4.
AAC SSR Scalable Sample Rate is based on the MPEG-2 Part 7 Scalable Sampling Rate profile (SSR) combined with Perceptual Noise Substitution (PNS). See sub-part 4.
AAC LTP Long Term Prediction introduces a forward predictor with lower computational complexity. Also uses AAC LC.
AAC LDLow Delay, used with CELP, HVXC, and TTSI in the Low Delay Profile. Suitable for real-time conversation applications.
AAC ELDEnhanced Low Delay improves the bitrate and latency at the expense of a small increase in computational workload.
SBRSpectral Band Replication used with AAC LC in the HE-AAC Profile version 1.
TwinVQTransform-domain Weighted Interleave Vector Quantization is designed for coding audio at extremely low bitrates (8 kbps). See sub-part 4.
CELPSpeech coding with Code Excited Linear Prediction operates at low bitrates. TwinVQ may be more efficient. Not suited for use with music. See sub-part 3.
HVXCSpeech coding with Harmonic Vector eXcitation Coding works well with low sample rates around 8 kHz delivering coded output at 1.6 kbps. Latency is very low making it suitable for telephony applications. See sub-part 2.
SSCSinuSoidal Coding. The technical underpinnings of Parametric Stereo coding for high quality audio. See sub-part 8.
PSParametric Stereo used with AAC LC and SBR in the HE-AAC v2 Profile. The implementation uses SinuSoidal Coding (SSC). Stereo audio is coded as a monaural channel with two differential channels for the left and right signals. See sub-part 8.
MP1, MP2, MP3MPEG-1/MPEG-2 Audio Layer 1,2 & 3 in MPEG-4 See sub-part 9.
USACUnified Speech and Audio Coding switches the coding strategy between low bitrate CELP ( for speech) and HE-AAC (for music) mid-stream as it determines which is more efficient for each segment. See ISO 23003-3.
BSACBit Sliced Arithmetic Coding is an alternative scalable noiseless coding mechanism providing almost perfect quality at 64 kbps. Used for Digital Media Broadcasting (DMB) services. See sub-part 4.
HILNParametric audio coding with Harmonic and Individual Line plus Noise. Sound can be coded as various harmonics of a sine wave plus a noise component described as a spectral envelope. See sub-part 7.
PNSPerceptual Noise Substitution improves efficiency by representing noise-like signal components with a parametric representation instead of coding the exact waveform. The decoder synthesizes the noise component based on the description.
DSTLossless coding of oversampled audio with Direct Stream Transfer. Popularized by Super Audio CDs. See sub-part 10.
ALSAudio Lossless Coding uses short and long-term predictors to encode sounds that are rich in harmonics. See sub-part 11.
SLSScalable Lossless Coding is based on a layered approach which implements a lossy coding component in AAC with an additional correction layer that enhances it to provide the lossless result. SLS and ALS are not related to one another. See sub-part 12.
SLS non-coreA lossless audio coder with a single coding stream without the lossy General Audio base layer.
MPEG SurroundAlso known as MPEG Spatial Audio Coding (SAC). Not the same as SAOC.
SAOCSpatial Audio Object Coding. See ISO 23003-2.
SAOC-DESpatial Audio Object Coding Dialogue Enhancement.
LD MPEG SurroundLow Delay MPEG Surround coding. The side channel information is described in ISO 23003-2.
Audio SyncAudio synchronization maintains the coherence of multiple content streams in multiple devices. See sub-part 13.
TTSIText to Speech Interface that synthesizes the audio. See sub-part 6.
SAStructured Audio describes the audio as components or algorithms. The top level is a scheduler for controlling the construction and playback. See sub-part 5.
Wavetable synthesisUses combinations of waveforms to create virtual instrument sounds.
Sample based synthesisSampled natural sound fragments are combined and mixed to create a track. Based on SoundFont technologies.
Algorithmic synthesisConverts a description of a sound with instructions for how to play it into a compiled source code form (such as C Language). Then an application can be created to generate the sound.
Audio effectsPart of the structured audio toolset.
SMR SimpleSimplified version of Symbolic Music Representation. See ISO 14496-23.
SMR MainMain version of Symbolic Music Representation. See ISO 14496-23.
SAOLStructured Audio Orchestra Language. Derived from the earlier MUSIC-N language.
SASLStructured Audio Score Language.
SASBFStructured Audio Sample Bank Format.
MIDIMusical Instrument Digital Interface describes sound (predominantly music based) as a series of events (notes), sounds (patches) and modulations (controls).
General MIDIA standard set of sounds defined by Roland Corp to provide instrument sound (patch) compatibility across multiple MIDI devices.
DLSDownloadable Sounds standardized digital musical instrument sound banks which can be used with data driven sound tracks such as MIDI or SAOL.

Spectral Band Replication (SBR)

Spectral Band Replication discards redundant harmonic components in the encoder but reconstructs them by replicating the lower frequencies to derive suitable replacements in the player. This can be used with any codec.

A typical stream of audio might be coded to a target maximum 128kbps bitrate. This would reproduce all frequencies up to 15kHz with a small reduction in the frequency response at the top end.

SBR cuts off the incoming frequencies at around 7.5kHz. This loses a lot of the detail but reduces the bitrate to 64kbps.

The higher band from 7.5kHz to 15kHz is processed through a more aggressive compression tool. This generates a description of the high frequency sounds that can be used in the decoder to reconstruct them from the lower order harmonics. The description is carried in auxiliary segments within the stream and only adds 1.5kbps to the bitstream (65.5 kbps in total).

The player transposes the lower frequencies into the upper band where it can filter and mix them in using the descriptions in the auxiliary segments derived from the higher frequencies. This is practical because the upper frequencies are likely to be harmonics of the lower band with a different amplitude envelope.

Perceptual Noise Substitution (PNS)

The bitrate gains from using PNS are often not worth the computational workload when the audio is of a high quality.

For noisy audio sources, the noise can be filtered out and described as control parameters for a pseudorandom noise generator in the player where they can be recreated.

Parametric Stereo (PS) & SinuSoidal Coding (SSC)

Parametric stereo exploits the similarity between the left and right channels to code them more efficiently.

The two channels are mixed down into a single monophonic channel and coded at full resolution. This is a base from which two differential channels can be derived. Those differences can be coded to a 3-kbps bitrate using SinuSoidal Coding.

The player decodes the mono channel and applies the differences to make the left and right outputs.

Instead of delivering two full bitrate channels, the encoder delivers one full bitrate channel and two very low bitrate differential channels.

Scalable Sample Rate (SSR)

Scaling the audio coding by splitting at the sample level is an interesting alternative to using base and enhancement layers.

If we de-interleave CD audio into three scalable streams then stream one carries the first sample, stream two the second, and stream three carries the third and perhaps the fourth. The next sample is added to stream one and so on. This yields two 11 kHz sample streams and one 22 kHz stream which can be used by the target device in any combination.

A low-quality service can be reconstructed from one stream or all of them can be combined to reconstruct the original sample stream.

Error Resilience (ER)

Some audio objects have Error Resilient counterparts which are indicated with the ‘ER’ prefix. This is useful for transmitting coded audio over unreliable and error prone network links.

Additional error resilience is possible with checksums and Forward Error Correction introduced as the payload is segmented into network packets.

Patent Licenses

Patents for MPEG-4 Audio coding are managed by Via Licensing. Contact them for a license if you design and sell an Encoder or Decoder (Player) of your own.

Content owners do not need a license to distribute their MPEG Audio content. They have implicitly paid for it when purchasing the encoder or decoder.

Patents for AAC baseline technologies expire in 2028 and some newer extensions will have active patents until 2031.

Profiles & Audio Objects

A profile could use a single Audio Object while other profiles stack the tools hierarchically to make more efficient and sophisticated coders. Complexity requires more computational effort and increases the latency:

  • MPEG-2 AAC-LC profile only uses the Low Complexity AAC-LC audio object.
  • MPEG-4 AAC-LC adds Perceptual Noise Substitution (PNS).
  • MPEG-4 HE-AAC v1 adds Spectral Band Replication (SBR).
  • MPEG-4 HE-AAC v2 Adds Parametric Stereo (PS).

Because the specification is hierarchical, HE-AAC v2 players can decode any of the lower stacked levels.

These are the standardized profiles. Organizations such as Fraunhofer create their own proprietary profiles.

The Fraunhofer Scalable Lossless Coding (HD-AAC) is not the same as the SLS support defined by the MPEG-4 standard.

Refer to section 1.5 of the MPEG-4 Part 3 standard for a detailed description of the Audio Objects and how they are mapped to the profiles.

ProfileIntroduced by
Low-ComplexityMPEG-2
MainMPEG-2
Scalable Sampling RateMPEG-2
AACMPEG-4
High Efficiency AAC (v1)MPEG-4
HE-AAC v2MPEG-4
Main AudioMPEG-4
Scalable AudioMPEG-4
Speech AudioMPEG-4
Synthetic AudioMPEG-4
High Quality AudioMPEG-4
Low Delay AudioMPEG-4
Low Delay v2 AudioMPEG-4
Natural AudioMPEG-4
Mobile Audio Inter-networkingMPEG-4
HD-AACMPEG-4
ALS SimpleMPEG-4
Extended High Efficiency AACMPEG-D
(Limited) Scalable Lossless CodingFraunhofer HD-AAC

Media Type Identifiers

Because HE-AAC is coded differently to classic AAC, a new media type is needed so that browsers can distinguish between the two formats:

Media type Description
audio/aac Use this for Standard AAC format content. This is the most widely supported.
audio/aacp Describes AAC+ content but is not as widely supported by web browsers.

 

Relevant Standards

The vintage column indicates the most recent base standard, corrigenda or amendment. Although the latest versions are indicated, earlier versions may contain relevant information that is removed from later standards. Some devices may be compatible only with an earlier version and you should use that if necessary when developing your services for them.

There is a gradual refactoring of the MPEG standards underway so they can benefit from reusing supporting technologies without needing to repeat them. The MPEG-D and Coding Independent Code Points standards are examples of that as are the ISO 23XXX group of MPEG relevant standards, which provide additional infrastructural support outside of the individual coding specifications.

StandardVersionDescription
ISO 11172-31996MPEG-1 Part 3 - Audio.
ISO 13818-12023MPEG-2 Part 1 - Systems.
ISO 13818-31998MPEG-2 Part 3 - Audio.
ISO 13818-72007MPEG-2 Part 7 - Advanced Audio Coding (AAC).
ISO 14496-12014MPEG-4 Part 1 - Systems. Currently being revised.
ISO 14496-32009MPEG-4 Part 3 - Audio coding. Released in 2001 & amended in 2003 & 2004.
ISO 14496-42019MPEG-4 Part 4 - Conformance bit-streams specification.
ISO 14496-52019MPEG-4 Part 5 - Reference Software.
ISO 14496-112015MPEG-4 Part 11 - Scene description & application engine.
ISO 14496-232008Symbolic Music Representation.
ISO 23091-32022MPEG-CICP - Coding Independent Code Points for delivering out of band metadata.
ISO 23001-8n/aWithdrawn & replaced by ISO 23091.
ISO 23003n/aMPEG-D is a group of standards for audio coding.
ISO 23003-12017MPEG-D Part 1 - MPEG Surround (a.k.a. Spatial Audio Coding).
ISO 23003-22018MPEG-D Part 2 - Spatial Audio Object Coding (SAOC).
ISO 23003-32021MPEG-D Part 3 - Unified speech & audio coding (USAC).
ISO 23003-42023MPEG-D Part 4 - Dynamic Range Control. Currently being revised.
ISO 23003-52020MPEG-D Part 5 - Uncompressed audio in MPEG-4 File Format.
ISO 23003-6ISO 23003-6MPEG-D Part 6 - USAC Reference Software.
ISO 23003-72022MPEG-D Part 7 - USAC Conformance specification.
DVB-H2004Handheld mobile TV services.
DVB-SH2008Handheld mobile TV services delivered via a satellite link.
ETSI TS 101 1542019HE-AAC & HE-AAC v2 audio coding for DVB applications.
ETSI TS 102 0052010Video & Audio Coding in DVB services delivered directly over IP protocols.
ETSI TR 102 3772009DVB-H Implementation guidelines.
ETSI TS 103 4662019DAB audio coding (MPEG Layer II).
ETSI TS 126 4012017Enhanced aacPlus general audio codec.
ETSI EN 302 3042004Describes DVB-H.
3GPP TS 26.4012024Describes the use of Enhanced AAC+ for mobile services.
General MIDI1999Developed by Roland to allow MIDI devices to sound similar when music sequences are played through them.
DLS1998The MIDI Downloadable Sounds Specification by the MIDI Manufacturers Association.
MIDI 1.01996The Complete MIDI 1.0 Detailed Specification by the MIDI Manufacturers Association.
MIDI 2.02020Extends MIDI 1.0 with additional capabilities.
ITU Rec H.2231998Annex C describes a Multiplexing Protocol For Low Bitrate Multimedia Communication Over Highly Error-Prone Channels.
ITU Rec H.222.01995See ISO/IEC 13818-1 - Systems.

Applying HE-AAC

Audio and video compression is a complex subject. We balance it here at a level sufficient to explain the fundamentals whilst avoiding a deep dive down the rabbit hole. Consult the MPEG-4 Part 3 standard if you need to explore MPEG Audio coding in greater detail.

The AAC audio standard is increasingly being used with High-Definition TV services (HDTV). This is supported by the DVB standards that are distributed by ETSI. HE-AAC is particularly relevant for mobile TV using DVB-H.

Digital radio services such as DAB+ and Digital Radio Mondiale are also adopting HE-AAC.

Supported by

You might also like...

Standards: Video - Advanced Video Coding (AVC)

AVC remains one of the most widely deployed video codecs in the world, but navigating its profiles, levels and signaling mechanisms is far from straightforward.

Network Traffic Engineering: RIST & SRT - The Success Of ARQ Based Protocols

IP networks are inherently unreliable. We kick off this series on IP Network Traffic Engineering with a look at how RIST and SRT give broadcast engineers user-configurable control over the latency-versus-reliability trade-off for real-time media streaming.

Standards: Video - Standards For Video Coding

From 4K to 32K, the demand for ever-larger video formats is pushing codec technology to its limits. This guide surveys the landscape of video coding standards – from legacy MPEG formats to AI-driven neural network compression – to help navigate the choices sha…

Broadcast Standards 2026 – Video Coding

Video coding was developed to deliver video conferencing services over low-bandwidth modem connections, but modern demands for ever-larger video formats are pushing codec technology to its limits.

Network Traffic Engineering: Part 1

IP networks are inherently unreliable. They always have been – it is literally designed in as a feature.