Standards: Part 5 - Standards For Audio Coding

This article describes the various AES, MPEG, Proprietary and Open Standards that pertain to audio.

This article is part of our growing series on Standards.
There is an overview of all 26 articles in Part 1 - An Introduction To Standards.

Audio production follows a similar workflow concept to video but the tools and container files are slightly different. The necessary computing and storage capacity is also reduced. Within broadcast workflows the management of audio content can be approached as additional tracks within the video container or separately in a specialized audio container. In a radio or podcast production workflow, there is no accompanying video.

Some file formats that store audio efficiently are useful when you ingest and file new recordings in a digital librarian system. The audio samples should be uncompressed to avoid artefacts. The files will support some metadata tagging of the content. Additional metadata goes into the content management database.

In addition to the summaries below you will find a far more comprehensive listing of the AES Standards & Recommended Practices, AES Information Documents and AES Project Reports in Appendix H.

Useful Standards For Audio Recording & Production

There are several sources of international standards for recording audio which benefit from the knowledge and experience of many industry experts:

AES - Audio Engineering Society
EBU - European Broadcasting Union
MPEG - Motion Picture Experts Group
SMPTE - Society of Motion Picture and Television Engineers

The MPEG standards are managed by ISO and are obtained through the online storefront. AES standards are available directly from the society where members can enjoy a discounted price.

Proprietary standards are embedded in the production tools. These will store their project assets in a more compact form but need exporting for more portable use downstream in the workflow.

License-free open-source standards and tools are a very attractive solution.

Relevant AES Standards

The Audio Engineering Society (AES) was established in 1948 and has been publishing standards since 1977.

AES strives to avoid the use of patented technologies or requires the patent holder to allow their use on a minimal or zero fee basis. The society also collaborates with other standards bodies such as the SMPTE, ISO, IEC, BSI and EBU.

The SMPTE ST2110 specification deploys AES standards in the context of an IP driven studio workflow. Data formats and transmission are covered by AES while ST 2110 describes how to apply them in a practical situation.

These AES standards are particularly relevant to an IP based workflow but you may find some of the others are useful too:

Number	Description
AES 3	Used for digital audio interconnection and also known as AES/EBU.
AES 10	Describes multi-channel digital audio interconnection and generically referred to as MADI.
AES 11	Describes digital audio synchronisation.
AES 31	A file format for exchanging audio data between systems and applications.
AES 50	Multi-channel audio over Ethernet.
AES 52	Describes how to insert of unique identifiers into AES 3 digital audio content.
AES 67	Interoperability of Audio over IP networks.
AES 70	Open Control Architecture.

Earlier AES standards are based on Asynchronous Transfer Mode (ATM) networks. An ATM network can carry voice and data simultaneously. Ethernet can only carry data but Voice over IP (VoIP) supports telephony applications as well.

AES 47 & 51 describe how to transmit audio over ATM networks.

Relevant MPEG Standards

This is a short list of the individual parts of the MPEG standards that are directly related to Audio processing. Some of these will have had contributions from AES and EBU experts. Some standards define audio modelling strategies where the audio is described algorithmically rather than as direct samples of recorded sound. The MPEG standards focus on coding techniques and storage container formats.

The MPEG standards may require patent license fees to be paid.

Standard	ISO Part No.	Description
MPEG-1	ISO 11172-3	Audio - Layers 1, 2 & 3 (mp1, mp2, mp3).
MPEG-2	ISO 13818-3	Audio - Adds lower bit rates and Multi-channel support to MPEG-1.
MPEG-2	ISO 13818-7	AAC - Advanced Audio Coding.
MPEG-4	ISO 14496-14	MP4 File Format.
MPEG-4	ISO 14496-15	AVC File Format.
MPEG-4	ISO 14496-23	Symbolic Music Representation (SMR).
MPEG-4	ISO 14496-24	Audio and systems interaction.
MPEG-4	ISO 14496-26	Audio Conformance.
MPEG-4	ISO 14496-3	Audio (Many subparts describing complex audio coding strategies).
MPEG-4	ISO 14496-8	Carriage over IP.
MPEG-7	ISO 15938-4	Specification for audio descriptors in a multimedia content description interface.
MPEG-A	ISO 23000-12	Interactive music application format.
MPEG-A	ISO 23000-2	MPEG music player application format.
MPEG-A	ISO 23000-4	Musical slide show application format.
MPEG-D	ISO 23003-1	MPEG Surround.
MPEG-D	ISO 23003-2	Spatial Audio Object Coding (SAOC).
MPEG-D	ISO 23003-3	Unified speech and audio coding.
MPEG-D	ISO 23003-4	Dynamic range control.
MPEG-D	ISO 23003-5	Uncompressed audio in MPEG-4 file format.
MPEG-D	ISO 23003-6	Unified speech and audio coding reference software.
MPEG-D	ISO 23003-7	Unified speech and audio coding conformance testing.
MPEG-H	ISO 23008-3	3D Audio.
MPEG-H	ISO 23008-6	3D Audio Reference Software.
MPEG-H	ISO 23008-9	3D Audio Conformance Testing.
MPEG-CICP	ISO 23091-3	Coding Independent Code Point descriptions for audio content.

Proprietary Standards

These are some proprietary container formats described here as file-type extensions. The license-fees depend on how they are used and deployed and what the target platforms are. The license-fees are usually included in the purchase of the tools or hardware used to create them. Some of these are platform specific which makes them less portable. They might be designed to carry combined video and audio but can also be used in audio only scenarios.

Extension	Format
ac3	Dolby AC3 surround sound file.
aif	See AIFF.
aifc	Compressed AIFF file.
aiff	Audio Interleave File Format extracted from a CD. Designed by Apple and based on IFF.
alac	Apple Lossless Audio Codec.
asf	Advanced Systems Format (alternative to wmv).
avi	Audio Video Interleave.
caf	Apple Lossless Audio (ALAC) files (uncommon).
dts	Digital Theatre Systems sound file.
evo	Enhanced VOB.
f4v	Flash Video file with H•264 video & AAC audio.
flv	Flash Video file containing SWF encoded content. Deprecated and should not be used for new projects.
iff	Electronic Arts Interchange File Format.
mov	QuickTime File Format.
qt	Early QuickTime File Format (rarely used now).
rmvb	RealMedia Variable Bitrate file.
vob	DVD Video Object.

Open Standards

Open-source codecs and storage container files offer many advantages. They are supported by a community of enthusiastic developers and perform well. They are ported to virtually every platform. Because the supporting source-code is available, you can customize them or port them to new platforms very easily. Open-source projects actively seek to avoid patents and license-fees so they are also attractive commercially.

If you benefit from their technology, then an occasional donation to support them would be good. This will ensure the project continues to thrive. Open does not mean free but the choice to pay is optional.

Extension	Format
ape	Monkey Audio file.
flac	Free Lossless Audio Codec (FLAC) coded audio.
mka	Matroška audio.
mpc	Musepack audio file.
mxf	Material Exchange Format.
ofr	OptimFROG lossless coded audio.
oga	Ogg audio file.
ogg	Ogg audio/video file.
ogm	Ogg media file.
opus	An Ogg format container containing Opus coded audio.
wav	WAV audio file. These are often used in radio broadcasting.
wave	See wav.
webm	WebM based on the Matroška format.

Tools & Software Apps

There is a diverse and sophisticated array of audio production tools available and many of the most popular tools are platform specific. Digital Audio Workstations and other tools aimed primarily at music production can be used very effectively for broadcast audio editing and post-production. Most of the main video post-production platforms offer increasingly sophisticated, tightly integrated, audio editing and production tools. Most professional software supports most of the commonly used standards, but if you need to use or deliver specific file formats it is wise to ensure that the tools you select are compatible before embarking on any project. Many platforms are supported on MacOS and Windows but not on Linux.

Deploying open-source audio tools instead is appropriate in these scenarios:

Editing on Linux workstations
Portability across all the major platforms
Conversion between formats
Detailed analysis of the content
Diagnosis of problems

The ffmpeg tool is often thought of as a video-conversion tool but it also has powerful support for audio file conversions too. Useful analysis tools are built-in and accessible from the command-line-interface.

Choosing An Appropriate Sample-rate

Harry Nyquist suggested the sample-rate should be at least twice the highest audible frequency to capture all the content. The bit-depth is also important in avoiding a staircase effect due to quantisation. This introduces harmonics that are well above the audible range and are removed by a post-processing filter on playback.

If the sample-rate is too low, then ghost frequencies that were not in the original recording will appear when the samples are rendered as output audio. This is called aliasing and should be avoided by choosing a high enough sample-rate.

The deployment sample-rate should be 44.1 kHz for audio only projects and 48 kHz if you want to embed the audio into a video project. Stick to one of these throughout your project to avoid unnecessary sample-rate conversions.

If you have sufficient storage and fast enough computers, work at twice or four times your deployment sample-rate. Filter, mix and process your audio with effects, then down-sample the finished recording without degradation.

Quality Control

Use uncompressed formats for editing and preparing content. Compressing the audio introduces artefacts which are not obvious at first but multiple compress-uncompress cycles will compound them distorting the output.

The sound playback must be continuous and consistent. Your customers can tolerate dropouts in the visual content far more easily than losing the sound momentarily. A programme with perfect video and intermittent sound is far less enjoyable than lower quality video with robust soundtracks.

The compression algorithm in MP3 discards sounds that the human ear will theoretically not hear. For example, soft violins during a loud cymbal crash. The human ear is very good at reconstructing that missing sound component so theoretically we don’t notice it. At the highest bit rates the result can be quite effective, but reducing the bit rate will degrade the quality of the audio. Do not use MP3 as a production or archival format. Retain the original raw-uncompressed source material in the archives.

Deployment

At the very final deployment stage, you will want to compress the audio for streaming or downloading.

For an audio-only deployment, MP3 is very widely supported but not as good quality as AAC which delivers higher quality for the same bitrate. HE-AAC is even better if your target devices support it.

Conclusion

Consider the codec and container formats for your audio content as different choices. Some combinations are mandated or prohibited so choose carefully.

For optimum portability, FLAC or Matroška containers are useful in a production workflow. On the other hand, if your workstations are all Apple based, the ALAC or AIFF formats are appropriate.

These Appendix articles contain additional information you may find useful:

Part of a series supported by

You might also like...

HDR & WCG For Broadcast: Part 2 - The Production Challenges Of HDR & WCG

Welcome to Part 2 of ‘HDR & WCG For Broadcast’ - a major 10 article exploration of the science and practical applications of all aspects of High Dynamic Range and Wide Color Gamut for broadcast production. Part 2 discusses expanding display capabilities and…

Great Things Happen When We Learn To Work Together

Why doesn’t everything “just work together”? And how much better would it be if it did? This is an in-depth look at the issues around why production and broadcast systems typically don’t work together and how we can change …

Microphones: Part 1 - Basic Principles

This 11 part series by John Watkinson looks at the scientific theory of microphone design and use, to create a technical reference resource for professional broadcast audio engineers. It begins with the basic principles of what a microphone is and does.

Standards: Part 19 - ST 2110-30/31 & AES Standards For Audio

Our series continues with the ST 2110-3x standards which deploy AES3 and AES67 digital audio in an IP networked studio. Many other AES standards are important as the foundations on which AES3 and AES67 are constructed.

Future Technologies: Artificial Intelligence & The Perils Of Confirmation Bias

We continue our series considering technologies of the near future and how they might transform how we think about broadcast, with a discussion of the critical topic of training AI models and how this is potentially compromised from the outset…