Understanding IP Production Networks: Part 5 - Audio Over IP Basics

How audio standards and proprietary solutions like Dante address the challenges of using asynchronous internet networks to distribute audio.

In television, audio is notoriously difficult to get right, and the human audio-visual system can cope much better with disturbances in pictures than sound. If a television picture flashes or bangs for a few seconds, then we don’t seem too concerned with it. If the audio in the same television transmission breaks up, stutters or distorts, then we are aware of the problems much more quickly, and we become easily frustrated or stressed if the problem persists.

Equipment manufacturers have gone to great lengths to make sure the audio provided is distortion free, doesn’t break up and is delivered in a known timeframe. Analog audio used point to point connections with dedicated cable. Digital built on this to give us time division multiplexed systems such as AES3 and MADI, albeit still over point-to-point connections.

As we move to the IP world, we must look at ways we can deliver audio over networks without delay, or breakup and distortion, while maintaining synchronization with the video. IP networks were originally designed to transport non-real-time bursty data such as web browser traffic. One of the benefits of point-to-point connections is that we can guarantee the audio will reach its destination within a predictable timeframe and with few errors. We must now apply this philosophy to computer networks.

The Audio Engineering Society (AES) has provided two point-to-point standards that have been used extensively in the broadcast industry; AES3 and AES10. AES3 provides pulse code modulated (PCM) audio samples in real time over a point-to-point connection. MADI was adopted by the AES and formally became AES10, building on AES3 to provide up to 56 channels of audio.

Wider Distribution

Point-to-point connections work well within studios and have stood the test of time. The problems start to occur when we move outside of the studio to distribute audio to the wider TV station and sending to other facilities, such as OB units, as this form of connectivity becomes very inefficient.

Figure 1 - Comparison of AVB and AES67/Dante systems.

Figure 1 - Comparison of AVB and AES67/Dante systems.

A 100Mbit/s MADI connection needs a 125Mbit/s baud rate connection, even if you only send one channel consisting of 2Mbit/sec. When using packet switched systems such as Ethernet or IP, we only need to use the data bandwidth required, in this instance 2Mbit/sec, with some minor overhead for packet framing.

Two technologies emerged as potential solutions early in the evolution of Audio-over-IP; IEEE 802.1BA and AES67. IEEE 802.1 is otherwise known as Audio Video Bridging (AVB) and works at layer 2 of the ISO seven-layer model. In previous chapters we demonstrated that layer 2 switches Ethernet packets and is limited to the domain of the broadcasters’ network. Although it’s marginally faster than IP switching, it’s difficult to move packets outside to other facilities and OB trucks.

To achieve synchronization, data integrity and low latency, IEEE 802.1 uses switcher protocols to provide rate shaping, a method to guarantee bandwidth, and clock synchronization. All of this requires the layer-2 switch to have IEEE 802.1BA protocols installed and configured. The IT department would need to understand this configuration and as it’s not standard IT, the system is much more complicated.

Figure 2 - The importance of clock synchronization between encoder and decoder.

Figure 2 - The importance of clock synchronization between encoder and decoder.

For many commercial broadcasters, the benefits of moving to consumer of the shelf (COTS) products was outweighed by the customization of switches required by AVB. Instead of using hot spares available that can be deployed anywhere in the network AVB requires keeping specific routers and switches for the broadcast part of the network.

At the time of writing in 2026 AVB is not widely deployed as a consequence, with many preferring a combination of AES67 and/or proprietary solutions like Dante or Ravenna.

AES67

The main problem with packet technology is jitter and indeterminate delay throughout the network. Solutions are required to synchronize the codec frame and bit clocks at the send and receive ends of the chain.

Without synchronization, the audio would degenerate to squeaks and pops and be completely inaudible.

AES67 is a packet technology working at the IP layer-3 level and provides a specification for the three main areas of audio over IP: this includes synchronization and transport, as well as encoding, and connection management. This is much easier to distribute outside of a facility as it uses IP routing, which can easily deliver to an OB truck. AES67 provides word and frame clock alignment to guarantee the delivery of high-quality audio over IT networks.

Dante by Audinate (and other competitors) go one step further as it abstracts away the IT network to deliver a user management system enabling simple discovery of connected equipment and interfacing to computers. In effect, it provides a system that is easy to manage without having to have an in-depth understanding of the underlying IT infrastructure.

AES67 and Dante both have the major advantage that they do not need any modifications to industry standard IP routers and can work alongside existing IT networks. Other benefits of Dante are that it provides a plug and play facility – we saw in the previous chapter on host configuration how easy it is to create ghosting IP addresses, or mis-configure a camera or sound console.

The Precision Time Protocol

SMPTE ST 2110 has adopted AES67 through the specification ST 2110-30, and both operate over standard enterprise IP networks. AES67 was originally developed to bridge incompatible Audio-over-IP systems from vendors such as Dante, Ravenna, and Livewire, providing a universal method for high-performance, low-latency audio transport. By incorporating AES67, ST 2110-30 ensures continued interoperability with these established systems while maintaining the flexibility and scalability required in modern broadcast networks.

A key element in both AES67 and ST 2110 is the use of Precision Time Protocol, or PTP (IEEE 1588). 

Using PTP, timing signals are sent from a master computer to each connected encoder or decoder. PTP maintains precise clock alignment across the network so that all audio, video, and ancillary data streams remain locked to the same time reference.  This synchronization is essential for maintaining lip-sync accuracy, frame-accurate switching, and deterministic system performance. Because PTP is fundamental to both standards, it forms a technical bridge that further enhances interoperability between different devices and systems.

Network engineers must configure their routers to allow PTP timing packets to have the fastest access through the network, in affect providing the best quality of service (QoS). Broadcast engineers must discuss this with the IT department so that the network engineers can apply the correct QoS parameters to the PTP packets.

The packet-switched IP design of AES67 allows audio channels to be scaled up or down with ease, with data packets dynamically routed across networks of varying size and topology. Audio streams can be distributed throughout local or wide-area broadcast facilities without the need for traditional baseband infrastructure. In the past, moving audio between digital domain boundaries, such as from an OB truck to a studio, often required conversion to baseband and resampling to mitigate timing discrepancies. This process increased system complexity and risked signal degradation. By contrast, AES67’s IP-based design maintains consistent timing and minimizes conversion overhead, improving overall reliability.

As an open, royalty-free standard, AES67 helps prevent vendor lock-in and promotes long-term system flexibility. Its adoption within ST 2110-30 strengthens interoperability across the broadcast chain and provides a proven, standards-based foundation for the continuing transition to fully IP-based production environments.

You might also like...

Designing IP Broadcast Systems - The Book

Designing IP Broadcast Systems is another massive body of research driven work - with over 27,000 words in 18 articles, in a free 84 page eBook. It provides extensive insight into the technology and engineering methodology required to create practical IP based broadcast…

An Introduction To Network Observability

The more complex and intricate IP networks and cloud infrastructures become, the greater the potential for unwelcome dynamics in the system, and the greater the need for rich, reliable, real-time data about performance and error rates.

2024 BEITC Update: ATSC 3.0 Broadcast Positioning Systems

Move over, WWV and GPS. New information about Broadcast Positioning Systems presented at BEITC 2024 provides insight into work on a crucial, common view OTA, highly precision, public time reference that ATSC 3.0 broadcasters can easily provide.

Next-Gen 5G Contribution: Part 2 - MEC & The Disruptive Potential Of 5G

The migration of the core network functionality of 5G to virtualized or cloud-native infrastructure opens up new capabilities like MEC which have the potential to disrupt current approaches to remote production contribution networks.

Designing IP Broadcast Systems: Addressing & Packet Delivery

How layer-3 and layer-2 addresses work together to deliver data link layer packets and frames across networks to improve efficiency and reduce congestion.