Standards: Audio - Advanced Audio Coding (AAC)
AAC succeeded MP3 by delivering better quality at lower bitrates. This guide examines how it works, compares the leading encoder implementations, and explains where it sits within the broader MPEG audio standards landscape.
About Advanced Audio Coding
AAC improves on the MP3 Perceptual Coding design to achieve higher compression ratios and better playback quality.
While the MP3 audio codec published in the MPEG-1 and MPEG-2 standards has been very successful, subsequent research explored how to reduce the bitrate and deliver better quality. Fraunhofer IIS were deeply involved again, this time with a larger cohort of collaborators.
MPEG-2 part 7 introduces the Advanced Audio Coding (AAC) standard, which supersedes MP3. Coding algorithms have been improved and new tools have been introduced to achieve a better compression ratio. It can also be used at a higher bitrate for better quality – but it is still a lossy codec.
MPEG-4 part 3 further enhances the AAC coding and adds support for other kinds of audio coding. Some of these are designed for very low bitrate transports.
Improvements Over MP3
The AAC codec improves on MP3 in several important areas:
- Compression algorithms.
- Sample rates.
- Multi-channel support.
- Base and enhancement layer encoding.
- Combining different encoders in MPEG-4.
- Additional profiles.
Compression Algorithms
MP3 provided two different but related kinds of compression:
- DCT - The Discrete Cosine Transform applied to a single frequency band.
- MDCT - The Modified Discrete Cosine Transformation applied to overlapping groups of frequency bands.
AAC is implemented purely with MDCT which significantly improves compression efficiency but means it is not backwards compatible with MP3. The number of frequency sub-bands is increased to 1024. Calculating the energy level for masking threshold control is now much more accurate and fine-grained. The basic DCT algorithm is explored more deeply in Chapter 3-2.
Sample Rates
Additional sample rates from 8kHz up to 96kHz extend the 16kHz to 48kHz range previously supported by MP3. Very low sample rates improve coding latency and are suitable for speech and telephony applications but not music.
Multi-Channel Support
Where MP3 supports only six channels of audio, AAC supports many more:
- 48 full-audio channels.
- 16 channels of low frequency effects below 120Hz.
- 16 dialog channels.
- 16 data streams.
This facilitates the coding of more sophisticated surround-sound systems.
Base & Enhancement Layered Encoding
New concepts introduce the idea of base layer encoding with optional enhancement layers using other codecs. For example, CELP (speech coding) can be improved by adding an AAC enhancement layer.
Combining Different Encoders
The General Audio (GA) coding environment described in MPEG-4 Part 3 adds two new encoders which can be used interchangeably with AAC:
- TwinVQ - Suitable for very low bitrates.
- BSAC Encoder - A scalable bitrate encoder with an error resilient bitstream.
The coding algorithms are particularly well described in the ISO 14496 Part 3 standard. Refer to sub-part 4 (around page 487) for a break down and description of each tool. The block diagram showing how these tools work in the encoder is especially helpful.
Transport Bitrates
There are several kinds of delivery for compressed audio when it is transported over a network:
| Transport | Description |
|---|---|
| CBR | Constant bitrate delivery allows traffic levels to be predicted more easily. |
| VBR | Variable bitrate delivery may be able to convey more content within the available capacity. Multiple streams will burst and shrink at different times. There is a possibility of trying to push too much data if they all burst at once. Use buffering or ABR to deal with it. |
| ABR | Adaptive bitrate delivery adjusts the compression quality dynamically to alter the capacity needed to deliver the content. This can alleviate the problems when multiple VBR streams are bursting at the same time. |
Profile Support
AAC is a modular codec with a variety of tools that can be optionally switched on when needed. MPEG-2 part 7 defines several profiles which configure the earlier AAC encoder. These are inherited by MPEG-4:
- Low-Complexity Audio.
- Main Audio.
- Scalable Sampling Rate.
MPEG-4 introduces new profiles to address a wider range of applications. The coding tools configured by each profile are now redesigned and described as Audio Objects. Some profiles combine other codecs with AAC using the layered support and a few do not use AAC at all:
- Main (updated).
- AAC profile.
- Long Term Prediction.
- Scalable Audio.
- Speech Audio.
- Synthetic audio.
- High quality audio.
- Low delay audio.
- Low Delay AAC.
- Low Delay AAC V2.
- Mobile Audio Inter Networking.
- Natural audio.
- High Definition AAC.
- ALS Simple.
The entire menagerie of 42 audio objects (previously called tools) are described in sub-section 1.5.1.2 of the standard. Study these objects to better understand the profiles. Audio objects are mapped to the profiles in Table 1.3 within the standard.
Storing AAC Content In Files
MPEG standards describe these alternatives for storing AAC coded audio content in file containers:
- ADIF - The Audio Data Interchange Format is used to store AAC coded audio on its own ADTS stream in an .aac file. This is initially defined in ISO 13818 part 7 and also discussed in ISO 14496 part 3. This format places all the data that controls the decoder into a single header that precedes the content stream. This is optimal for file exchange since it is available right away. Randomly seeking to different points in the stream is not supported during playback. Because the content is a raw encoded audio elementary stream, metadata tagging is also not supported.
- MP4FF - The MPEG-4 File Format is described in ISO 14496 part 12. This format does support metadata tagging and is stored in .mp4 or .m4a files.
- 3GP - AAC audio can be carried in .3gp files since they are derived from the MPEG-4 Part-12 standard. Mobile applications require low bit rates and the AAC content should be coded accordingly.
Other containers such as QuickTime and Matroška can also be used.
| Project | Description |
|---|---|
| Apple AAC | Part of QuickTime and iTunes but can be called to action with the afconvert shell command on a macOS system. It also integrates with the ffmpeg command-line tool. This is thought to be the best performing encoder for general use. |
| Fraunhofer FDK AAC | Released as part of the Android project. It is open-source but may require license fees. This is a low latency version of the encoder. This can be integrated with the ffmpeg tool. It is recommended as a good quality encoder. |
| fdkaac | A command-line tool built on top of the Fraunhofer FDK AAC software library. |
| ffmpeg/Libav fork | The ffmpeg project has incorporated improvements to make this a more stable coder. The VBR support is reckoned to be poor and some of the more sophisticated audio object types are unsupported. |
| Fraunhofer FhG AAC |
Embedded inside Winamp on Windows but can be called to action from the command-line with the fhgaacenc command. This is developed by an entirely different team and uses a different mathematical technique compared to the FDK encoder. |
| Nero AAC | Free for non-commercial use. Only available on Windows and Linux. Unsupported since 2010. The neroAacEnc command-line tool converts .wav files into .mp4 files containing AAC audio. |
| FAAC | Partly open-sourced with proprietary components. The CBR support is reckoned to be inadequate. Based on the MPEG reference code published as part of the ISO standard. |
| Microsoft MFT AAC | The supported channel-count varies depending on which version of the Windows OS this is hosted on. |
| Libav | Stereo only. Not as up to date as the ffmpeg fork of this project. Can be used as a foundation to build command-line tools. |
| VisualOn AAC | Poorly performing CBR performance and no support for VBR in this codec implementation. This project is declared to be open-source but that is not confirmed from a patents and legal perspective. |
AAC Implementations
Coding tools are often presented via a Graphical User Interface (GUI) wrapper for easier access. It may not be obvious at first, but the encoders are also accessible from the shell scripting environment. Command-line tools integrate more easily with workflow automation than graphical user interface applications.
Note that ffmpeg is both a command-line tool (ffmpeg) and an open-source project (ffmpeg).
The Apple AAC encoder is considered to be the best implementation for medium bitrate scenarios. The CBR, VBR and ABR support is exceedingly good. It was originally part of the QuickTime media framework but has now been moved into the AV Foundation and AudioToolbox. This is all part of what Apple calls CoreAudio. For many workflow automation situations, a macOS based processing node running this encoder will be an optimal solution.
Fraunhofer codecs are very versatile and technically the best. This is because Fraunhofer was one of the key developers of the psychoacoustic approach to audio coding. Deploying the Fraunhofer FDK AAC encoder on a Linux platform would be a very good solution but be careful to investigate and pay the licensing fees if necessary.
The Nero encoder is also highly recommended but is not being actively developed any further. It has not been revised since 2010. Whilst it is good for niche situations it is not recommended for new project deployments.
The rest of the AAC implementations are somewhat lacking in their support for all the different modes of operation. There are a few implementations based on the libraries developed by Coding Technologies. They collaborated with Fraunhofer on the AAC research. These may incur license fees when deployed.
Deploying The Apple AAC Encoder In A Workflow
Deploy a processing node in your workflow based on a single Macintosh computer to call the Apple AAC encoder to action from the command-line.
Run a frequently scheduled task to check a watch folder for input files. This is easy to implement. The macOS environment supports all the tools you need to pass the output to the next stage of the workflow.
In more recent versions of macOS the afconvert command has a very simple syntax:
afconvert <options> <input-file> <output-file>
Open the Terminal app and type this to see all the supported options on your macOS platform:
afconvert -hf
This will provide some additional explanations about the options:
afconvert -h
The ffmpeg command-line tool will invoke the Apple AAC encoder when it is installed and run on a macOS platform. This may be useful for adding ID3 metadata tags after encoding.
The Apple encoder library is present on a Windows platform if it has been installed as part of a legacy QuickTime or iTunes installation. The QAAC open-source command-line tool calls it to action. Use this instead of afconvert which is not supported on Windows.
Alternatively, deploy the Apple Compressor application on a macOS system configured as a server node in your workflow infrastructure.
File Name Extensions
These file extensions are relevant when coding MPEG AAC Audio:
| Project | Description |
|---|---|
| .aac | Contains an ADTS stream of raw AAC coded content. |
| .mp4 | A general-purpose digital media container to carry videos, images, timed text and subtitles. Based on MPEG-4 part 12 and derived from the Apple QuickTime .mov file format. |
| .m4a | Describes an MPEG4 Audio only file. Originally created by Apple for use with iTunes. |
| .m4b | Designed for use with Audio Book content. |
| .m4p | This is an .m4a AAC file that has been copy-protected with a proprietary Digital Rights Management (DRM) technology created by Apple for iTunes. |
| .m4r | An Apple iPhone ringtone container. |
| .m4v | An MPEG-4 video file which may also contain AAC audio. |
| .mpg | One of several file types used for MPEG-1 or MPEG-2 audio and video content. This describes an MPEG-1 or 2 program stream or an MPEG-2 transport stream. Audio coded with AAC can be stored in .mpg files but this is uncommon and not recommended. |
| .mov | QuickTime media platform container file. Typically contains a movie but could be an interactive multimedia presentation. |
| .3gp | Based on MPEG-4 Part 12. Originally designed for early mobile (feature) phones. This is the preferred file extension. |
| .3g2 | A second-generation file format for low bitrate content. |
| .3ga | A variation of .3gp for audio only. |
| .3gpa | A variation of .3gp for audio only. |
| .3gpp | Mixed media format for mobile phone use. |
| .3gpp2 | Mixed media format for mobile phone use. |
| .3gp2 | Mixed media format for mobile phone use. |
Media Type Identifiers
Media types are registered for many different kinds of content. AAC coded audio should be delivered with the audio/aac media type so the receiving player can correctly determine the payload format.
| Media type | Status | Description |
|---|---|---|
| audio/aac | Preferred | The preferred default media type. Defined in ISO 13818-7 and ISO 14496-3. |
| audio/aacp | Next Generation | Describes AAC Plus (HE-AAC). |
| audio/3gpp | Legacy | Used with feature phones and defined in RFC 3839. |
| audio/3gpp2 | Legacy | Used with feature phones and defined in RFC 4393. |
| audio/mp4 | Current | Described in RFC 4337 and updated in RFC 6381 to add ISO file containers. |
| audio/mp4a-latm | Current | RTP payload format suitable for teleconferencing. Described in RFC 3016 and updated in RFC 6416. |
| audio/mpeg4-generic | Current | RFC 3640 describes the RTP Payload Format for Transport of MPEG-4 Elementary Streams. Updated by RFC 6295. |
| audio/x-aac | Proprietary | Deprecated for use in new projects. Use audio/aac instead. Not registered with IANA. |
Deprecate the use of the ‘X-’ tagged prototype media type identifiers if there is a standardized version available without the prefix.
Relevant Standards
Consult these standards documents for background information:
| Document | Vintage | Description |
|---|---|---|
| ISO 11172-3 | 1996 | MPEG-1 Part 3 - Audio is the foundation on which the earliest MPEG audio coding is built. The latest version is dated 1993 with a corrigendum published in 1996. |
| ISO 13818-1 | 2023 | MPEG-2 Systems. Describes packaging and stream structures. An amendment to codec parameters is in progress. |
| ISO 13818-3 | 1998 | MPEG-2 Audio. This is definitive for Layers I, II and III (MP1, MP2 and MP3). |
| ISO 13818-7 | 2010 | Describes MPEG-2 Advanced Audio Coding (AAC). Published in 2006 with revisions added in 2010. |
| ISO 14496-1 | 2014 | MPEG-4 Systems and original container format. Published in 2010 with corrections added in 2014. A new version is under development. |
| ISO 14496-3 | 2019 | MPEG-4 Audio. Describes how to combine AAC with other codecs. |
| ISO 14496-6 | 2000 | MPEG-4 Delivery Multimedia Interface Format (DMIF). |
| ISO 14496-12 | 2022 | MPEG-4 file format. A new version is under development. |
| ISO 14496-14 | 2020 | MPEG-4 version 2 file format. |
| ISO 15938-1 | 2006 | MPEG-7 Systems. Originally published in 2002 and updated in 2006. |
| ISO 15938-2 | 2002 | MPEG-7 Descriptions Definition Language (DDL). |
| ISO 15938-4 | 2006 | MPEG-7 Metadata for audio. Originally published in 2002 and updated in 2006. |
| ISO 15938-8 | 2011 | Extraction and use of MPG-7 metadata descriptions. Originally published in 2002 and updated in 2011. |
| ISO 15938-9 | 2012 | MPEG-7 Profiles & Levels. Originally published in 2005 and updated in 2012. |
| ISO 15938-10 | 2007 | MPEG-7 Schema definition. Originally published in 2005 and updated in 2007. |
| ISO 15938-11 | 2012 | MPEG-7 Profile schemas. Originally published in 2005 and updated in 2012. |
| ISO 15938-12 | 2012 | MPEG-7 Query format. |
| ISO 21000 | Various | MPEG-21 describes mechanisms for access control for multimedia content. This set of standards is under review with a new version expecting to be published. |
| ISO 23001-8 | Withdrawn | Coding-independent code points. Withdrawn and superseded by ISO 23091. |
| ISO 23003-1 | 2017 | MPEG Surround for multi-channel audio. Originally published in 2007 and updated in 2017. |
| ISO 23003-2 | 2008 | Spatial Audio Object Coding (SAOC). |
| ISO 23003-4 | 2020 | Dynamic Range Control. A new version is being prepared. |
| ISO 23091-3 | 2022 | Coding Independent Code Points for Audio. Playback controlling metadata. Originally published in 2018 and updated in 2022. |
| ITU-T Rec. H.222.0 | 2022 | See ISO 13818-1. |
| ETSI TS 126 244 | 2008 | Defines the .3gp container file format. Freely available to download from the ETSI.org web site. |
Applying AAC
AAC coding will be subject to patent licensing fees until 2031. For the time being you will need to contact a patent pool for a license if you build and distribute an encoder or player implemented in hardware or software. The coded bitstreams transmitted to end users are free of any licensing obligations.
This is not the whole AAC story. High Efficiency AAC was developed to improve the performance still further. That is sometimes called AAC+, and we look at this in more detail in the next chapter.
There is also more to study and understand in the MPEG-4 (ISO 14496) standard. MPEG is reorganizing the collection of standards and MPEG-D (ISO 23003) has some relevant material and ISO 23091-3 is helpful for player design.
These Appendix articles contain additional information you may find useful:
Supported by
You might also like...
SMPTE Education Launches Summer 2026 Lineup Of IP And ST 2110 Courses
Boasting two standalone courses, an intensive boot camp, and a hands-on practical lab, SMPTE Education has launched its summer 2026 Lineup of IP and ST 2110 Courses.
Standards: Video - Advanced Video Coding (AVC)
AVC remains one of the most widely deployed video codecs in the world, but navigating its profiles, levels and signaling mechanisms is far from straightforward.
Network Traffic Engineering: RIST & SRT - The Success Of ARQ Based Protocols
IP networks are inherently unreliable. We kick off this series on IP Network Traffic Engineering with a look at how RIST and SRT give broadcast engineers user-configurable control over the latency-versus-reliability trade-off for real-time media streaming.
Standards: Video - Standards For Video Coding
From 4K to 32K, the demand for ever-larger video formats is pushing codec technology to its limits. This guide surveys the landscape of video coding standards – from legacy MPEG formats to AI-driven neural network compression – to help navigate the choices sha…
Broadcast Standards 2026 – Video Coding
Video coding was developed to deliver video conferencing services over low-bandwidth modem connections, but modern demands for ever-larger video formats are pushing codec technology to its limits.