Standards: Part 5 - Standards For Audio Coding
This article describes the various AES, MPEG, Proprietary and Open Standards that pertain to audio.
This article is part of our growing series on Standards.
There is an overview of all 26 articles in Part 1 - An Introduction To Standards.
Audio production follows a similar workflow concept to video but the tools and container files are slightly different. The necessary computing and storage capacity is also reduced. Within broadcast workflows the management of audio content can be approached as additional tracks within the video container or separately in a specialized audio container. In a radio or podcast production workflow, there is no accompanying video.
Some file formats that store audio efficiently are useful when you ingest and file new recordings in a digital librarian system. The audio samples should be uncompressed to avoid artefacts. The files will support some metadata tagging of the content. Additional metadata goes into the content management database.
In addition to the summaries below you will find a far more comprehensive listing of the AES Standards & Recommended Practices, AES Information Documents and AES Project Reports in Appendix H.
Useful Standards For Audio Recording & Production
There are several sources of international standards for recording audio which benefit from the knowledge and experience of many industry experts:
- AES - Audio Engineering Society
- EBU - European Broadcasting Union
- MPEG - Motion Picture Experts Group
- SMPTE - Society of Motion Picture and Television Engineers
The MPEG standards are managed by ISO and are obtained through the online storefront. AES standards are available directly from the society where members can enjoy a discounted price.
Proprietary standards are embedded in the production tools. These will store their project assets in a more compact form but need exporting for more portable use downstream in the workflow.
License-free open-source standards and tools are a very attractive solution.
Relevant AES Standards
The Audio Engineering Society (AES) was established in 1948 and has been publishing standards since 1977.
AES strives to avoid the use of patented technologies or requires the patent holder to allow their use on a minimal or zero fee basis. The society also collaborates with other standards bodies such as the SMPTE, ISO, IEC, BSI and EBU.
The SMPTE ST2110 specification deploys AES standards in the context of an IP driven studio workflow. Data formats and transmission are covered by AES while ST 2110 describes how to apply them in a practical situation.
These AES standards are particularly relevant to an IP based workflow but you may find some of the others are useful too:
Number | Description |
---|---|
AES 3 | Used for digital audio interconnection and also known as AES/EBU. |
AES 10 | Describes multi-channel digital audio interconnection and generically referred to as MADI. |
AES 11 | Describes digital audio synchronisation. |
AES 31 | A file format for exchanging audio data between systems and applications. |
AES 50 | Multi-channel audio over Ethernet. |
AES 52 | Describes how to insert of unique identifiers into AES 3 digital audio content. |
AES 67 | Interoperability of Audio over IP networks. |
AES 70 | Open Control Architecture. |
Earlier AES standards are based on Asynchronous Transfer Mode (ATM) networks. An ATM network can carry voice and data simultaneously. Ethernet can only carry data but Voice over IP (VoIP) supports telephony applications as well.
AES 47 & 51 describe how to transmit audio over ATM networks.
Relevant MPEG Standards
This is a short list of the individual parts of the MPEG standards that are directly related to Audio processing. Some of these will have had contributions from AES and EBU experts. Some standards define audio modelling strategies where the audio is described algorithmically rather than as direct samples of recorded sound. The MPEG standards focus on coding techniques and storage container formats.
The MPEG standards may require patent license fees to be paid.
Standard | ISO Part No. | Description |
---|---|---|
MPEG-1 | ISO 11172-3 | Audio - Layers 1, 2 & 3 (mp1, mp2, mp3). |
MPEG-2 | ISO 13818-3 | Audio - Adds lower bit rates and Multi-channel support to MPEG-1. |
MPEG-2 | ISO 13818-7 | AAC - Advanced Audio Coding. |
MPEG-4 | ISO 14496-14 | MP4 File Format. |
MPEG-4 | ISO 14496-15 | AVC File Format. |
MPEG-4 | ISO 14496-23 | Symbolic Music Representation (SMR). |
MPEG-4 | ISO 14496-24 | Audio and systems interaction. |
MPEG-4 | ISO 14496-26 | Audio Conformance. |
MPEG-4 | ISO 14496-3 | Audio (Many subparts describing complex audio coding strategies). |
MPEG-4 | ISO 14496-8 | Carriage over IP. |
MPEG-7 | ISO 15938-4 | Specification for audio descriptors in a multimedia content description interface. |
MPEG-A | ISO 23000-12 | Interactive music application format. |
MPEG-A | ISO 23000-2 | MPEG music player application format. |
MPEG-A | ISO 23000-4 | Musical slide show application format. |
MPEG-D | ISO 23003-1 | MPEG Surround. |
MPEG-D | ISO 23003-2 | Spatial Audio Object Coding (SAOC). |
MPEG-D | ISO 23003-3 | Unified speech and audio coding. |
MPEG-D | ISO 23003-4 | Dynamic range control. |
MPEG-D | ISO 23003-5 | Uncompressed audio in MPEG-4 file format. |
MPEG-D | ISO 23003-6 | Unified speech and audio coding reference software. |
MPEG-D | ISO 23003-7 | Unified speech and audio coding conformance testing. |
MPEG-H | ISO 23008-3 | 3D Audio. |
MPEG-H | ISO 23008-6 | 3D Audio Reference Software. |
MPEG-H | ISO 23008-9 | 3D Audio Conformance Testing. |
MPEG-CICP | ISO 23091-3 | Coding Independent Code Point descriptions for audio content. |
Proprietary Standards
These are some proprietary container formats described here as file-type extensions. The license-fees depend on how they are used and deployed and what the target platforms are. The license-fees are usually included in the purchase of the tools or hardware used to create them. Some of these are platform specific which makes them less portable. They might be designed to carry combined video and audio but can also be used in audio only scenarios.
Extension | Format |
---|---|
ac3 | Dolby AC3 surround sound file. |
aif | See AIFF. |
aifc | Compressed AIFF file. |
aiff | Audio Interleave File Format extracted from a CD. Designed by Apple and based on IFF. |
alac | Apple Lossless Audio Codec. |
asf | Advanced Systems Format (alternative to wmv). |
avi | Audio Video Interleave. |
caf | Apple Lossless Audio (ALAC) files (uncommon). |
dts | Digital Theatre Systems sound file. |
evo | Enhanced VOB. |
f4v | Flash Video file with H•264 video & AAC audio. |
flv | Flash Video file containing SWF encoded content. Deprecated and should not be used for new projects. |
iff | Electronic Arts Interchange File Format. |
mov | QuickTime File Format. |
qt | Early QuickTime File Format (rarely used now). |
rmvb | RealMedia Variable Bitrate file. |
vob | DVD Video Object. |
Open Standards
Open-source codecs and storage container files offer many advantages. They are supported by a community of enthusiastic developers and perform well. They are ported to virtually every platform. Because the supporting source-code is available, you can customize them or port them to new platforms very easily. Open-source projects actively seek to avoid patents and license-fees so they are also attractive commercially.
If you benefit from their technology, then an occasional donation to support them would be good. This will ensure the project continues to thrive. Open does not mean free but the choice to pay is optional.
Extension | Format |
---|---|
ape | Monkey Audio file. |
flac | Free Lossless Audio Codec (FLAC) coded audio. |
mka | Matroška audio. |
mpc | Musepack audio file. |
mxf | Material Exchange Format. |
ofr | OptimFROG lossless coded audio. |
oga | Ogg audio file. |
ogg | Ogg audio/video file. |
ogm | Ogg media file. |
opus | An Ogg format container containing Opus coded audio. |
wav | WAV audio file. These are often used in radio broadcasting. |
wave | See wav. |
webm | WebM based on the Matroška format. |
Tools & Software Apps
There is a diverse and sophisticated array of audio production tools available and many of the most popular tools are platform specific. Digital Audio Workstations and other tools aimed primarily at music production can be used very effectively for broadcast audio editing and post-production. Most of the main video post-production platforms offer increasingly sophisticated, tightly integrated, audio editing and production tools. Most professional software supports most of the commonly used standards, but if you need to use or deliver specific file formats it is wise to ensure that the tools you select are compatible before embarking on any project. Many platforms are supported on MacOS and Windows but not on Linux.
Deploying open-source audio tools instead is appropriate in these scenarios:
- Editing on Linux workstations
- Portability across all the major platforms
- Conversion between formats
- Detailed analysis of the content
- Diagnosis of problems
The ffmpeg tool is often thought of as a video-conversion tool but it also has powerful support for audio file conversions too. Useful analysis tools are built-in and accessible from the command-line-interface.
Choosing An Appropriate Sample-rate
Harry Nyquist suggested the sample-rate should be at least twice the highest audible frequency to capture all the content. The bit-depth is also important in avoiding a staircase effect due to quantisation. This introduces harmonics that are well above the audible range and are removed by a post-processing filter on playback.
If the sample-rate is too low, then ghost frequencies that were not in the original recording will appear when the samples are rendered as output audio. This is called aliasing and should be avoided by choosing a high enough sample-rate.
The deployment sample-rate should be 44.1 kHz for audio only projects and 48 kHz if you want to embed the audio into a video project. Stick to one of these throughout your project to avoid unnecessary sample-rate conversions.
If you have sufficient storage and fast enough computers, work at twice or four times your deployment sample-rate. Filter, mix and process your audio with effects, then down-sample the finished recording without degradation.
Quality Control
Use uncompressed formats for editing and preparing content. Compressing the audio introduces artefacts which are not obvious at first but multiple compress-uncompress cycles will compound them distorting the output.
The sound playback must be continuous and consistent. Your customers can tolerate dropouts in the visual content far more easily than losing the sound momentarily. A programme with perfect video and intermittent sound is far less enjoyable than lower quality video with robust soundtracks.
The compression algorithm in MP3 discards sounds that the human ear will theoretically not hear. For example, soft violins during a loud cymbal crash. The human ear is very good at reconstructing that missing sound component so theoretically we don’t notice it. At the highest bit rates the result can be quite effective, but reducing the bit rate will degrade the quality of the audio. Do not use MP3 as a production or archival format. Retain the original raw-uncompressed source material in the archives.
Deployment
At the very final deployment stage, you will want to compress the audio for streaming or downloading.
For an audio-only deployment, MP3 is very widely supported but not as good quality as AAC which delivers higher quality for the same bitrate. HE-AAC is even better if your target devices support it.
Conclusion
Consider the codec and container formats for your audio content as different choices. Some combinations are mandated or prohibited so choose carefully.
For optimum portability, FLAC or Matroška containers are useful in a production workflow. On the other hand, if your workstations are all Apple based, the ALAC or AIFF formats are appropriate.
These Appendix articles contain additional information you may find useful:
Part of a series supported by
You might also like...
Microphones: Part 3 - Human Auditory System
To get the best out of a microphone it is important to understand how it differs from the human ear.
IP Security For Broadcasters: Part 2 - The Problem To Be Solved
By assuming that IP must be made secure, we run the risk of missing a more fundamental question that is often overlooked: why is IP so insecure?
IP Security For Broadcasters: Part 1 - Psychology Of Security
As engineers and technologists, it’s easy to become bogged down in the technical solutions that maintain high levels of computer security, but the first port of call in designing any secure system should be to consider the user and t…
Designing IP Broadcast Systems - The Book
Designing IP Broadcast Systems is another massive body of research driven work - with over 27,000 words in 18 articles, in a free 84 page eBook. It provides extensive insight into the technology and engineering methodology required to create practical IP based broadcast…
Microphones: Part 2 - Design Principles
Successful microphones have been built working on a number of different principles. Those ideas will be looked at here.