Network Architecture for Studio Video Over IP

In an article in the August issue title “Ethernet Basics for Studio Video Over IP,” I gave an overview of studio video over IP (SVIP) in the uncompressed domain using Ethernet. That article covered Ethernet basics such as subnets, multicasting, virtual local area networks, bandwidth considerations, and the Open Systems Interconnection model. Now we’ll look at the essential components of elementary audio, video, and data streams in an Ethernet network, and I’ll present some approaches to network design meant to help you get started building an organized Ethernet architecture.

It Starts With Multicast

Unicast refers to a one-to-one transmission from one point in the network to another; that is, one sender and one receiver, each identified by a network address. Unicast is common, but this simple point-to-point connection isn’t practical in broadcast because, by their nature, broadcast plants rely on many different receivers or endpoints listening to the same source.

Multicast, on the other hand, is when a single video or audio transmitter connects to many different video or audio receivers. It’s ideal for a modern broadcast environment because it acts like an SDI router, connecting a single source to multiple destinations.

Making multicast work requires first connecting the transmitters to the receivers. There are many ways to make that connection, but for the sake of simplicity, the most important elements are:

A common subnet — The transmitter and receiver must be connected within the same subnet. Many refer to this foundational connection as the “source port,” since the multicast traffic being emitted into the network must originate from an IP port using an IP address.

IGMP — The protocol for joining the receiver to the transmitter is called Internet Group Management Protocol (IGMP), which exists in the Ethernet switch. The protocol enables communication between multiple receivers, not only notifying the transmitter that the switch will be joining receivers to it, but also making sure that the transmitter is aware that the switch is listening. What’s important with IGMP is that the Ethernet switch manages this connection. By monitoring which receivers are listening and breaking connection with receivers that are no longer accepting packets, the Ethernet switch can manage multicast traffic to network endpoints.

Packets: The Foundation of All Ethernet Data Streams

UDP for transport — The Universal Datagram Packet (UDP) is the most common means of transporting video in Ethernet environments. UDPs lack functions such as error correction, sequencing, duplicate elimination, flow control, and congestion control, but that simplicity is what makes them so common. Since they don’t require any direct connection management, they’re versatile for data such as essence video. In Ethernet, the maximum amount of data in a frame is approximately 1,500 bytes. With around 28 bytes reserved for headers, that leaves about 1,472 bytes for video data — just enough to avoid fragmentation.

UDPs have their downfall, however, because there’s no way to number them. When it comes to video, that’s problematic because video gets played out according to the number of frames per second. It’s critical that the frames are in the correct order, and UDPs are incapable of sequencing. Thankfully, there’s RTP.

RTP for sequencing — Real Time Protocol (RTP) is ideal for video because it handles the vital task of sequencing the packets. Even better, RTP packets are small enough that about seven of them can fit within a single UDP. Furthermore, RTP packets can be time-stamped, with the timecode existing as a separate data stream rather than a marker placed on the video.

PTP for time-stamping — Precision Time Protocol (PTP) is the method used to time-stamp the RTP packets, and while its use in the uncompressed-video-over-IP domain is a bit complex, the basic concept is straightforward. The idea is that the transmitting device reads the PTP packets on the network and stamps the RTP packets as they are emitted to the Ethernet network. In this manner, PTP packets serve as a synchronization source. PTP packets are very small compared to RTP packets that are carried by UDPs, but PTP and RTP packets exist together for different purposes.

PTP isn’t used in all video Ethernet transmissions; for instance, the SMPTE 2022 family uses clock rates based on a common SDI clock of 27 MHz. But PTP, as a pure Ethernet timing method, is the ideal synchronization centerpiece of the new SMPTE ST 2110 standard — itself built from the ground up on Ethernet.

All these packets in various combinations form audio, video, and ancillary data streams — some of the threads that flow through the fabric of a SMPTE ST 2110 Ethernet network. Understanding how those components work together goes a long way toward understanding new video- and audio-over-IP topologies — and how to approach your own SVIP network architecture.

Designing an SVIP Network

An Ethernet-based architecture is often described as a fabric. In that analogy, all the connections in an Ethernet design are woven together just as tiny threads of fiber are woven to create textiles. In the end, a fabric can take many different shapes, but its underlying structure is consistent and, if woven correctly, builds a strong composite. Likewise, Ethernet fabric is the composition of similar and dissimilar data flows that define a set of functions in and around themselves.

Keeping all the threads organized is the secret to any well-managed Ethernet fabric, and that requires planning. Figure 1 offers an example of how various components can be woven like threads to create a strong Ethernet fabric for a simple video network.

Figure 1.  While real-world networks might be much more complex, this example is useful for showing how an Ethernet network could be fundamentally constructed. Note that the Management subnet is at and all of the essence sources emit from the subnet.  The multicast now ‘rides’ within this subnet at 239.1.1.X:5000.

Figure 1. While real-world networks might be much more complex, this example is useful for showing how an Ethernet network could be fundamentally constructed. Note that the Management subnet is at and all of the essence sources emit from the subnet. The multicast now ‘rides’ within this subnet at 239.1.1.X:5000.

When creating your own Ethernet fabric, consider these approaches:

Manage video on 10 Gigabit switches and audio on 1 Gigabit switches. With SMPTE ST 2110, it’s possible to save bandwidth by transporting different types of data through different sizes of pipes. Video demands 10 Gb/s pipes, while audio and data require much less bandwidth (100 MB/s and 1 Gb/s, respectively). Keep in mind that 10 Gb/s ports can be rather expensive, so use bandwidth accordingly, and ensure that six HD signals can fit within one port when aggregating payloads.

Group signals according to workflow. Given that SDI workflows have tied routers together for years, keeping signals in groups according to workflow is something to consider. Take the example of two studios, in which the IP source port addresses and multicast addresses for each must be kept together logically in order to distinguish the studios from each other. Studio A might use a subnet of with for multicasts, but Studio B might use and This scheme gives us a general sense of addresses for both studios. It doesn’t mean we can’t route the signals between subnets or gather the signals into a common pool. It simply gives us some architectural context for where the signals belong in a common address scheme.

Keep device control and management separate. It’s important to keep device control and management addresses and their associated signaling away from the “business end” of the audio, video, and data essence flows. In other words, separate the device control network from the signal network. They should exist on separate 1 Gb/s Ethernet switches and be wired separately.

Plan for security. Good network design is imperative for tight security, so as a security measure, keep signals in manageable subnets so that signal flows are carefully connected. Every port and every connection within the Ethernet fabric must be accounted for.

In Summary

As professional video over IP continues to evolve, early network designers will face plenty of challenges. No matter how you approach your own network design, the most important thing is to keep it organized. As the network gains complexity, an organizational structure that is well-planned from the beginning will help make life easier for those who must manipulate all the video, audio, data, and control subnets.

Editor note: Part 1 of Mr. Barella’s Ethernet tutorial can be found here, “Ethernet Basics for Studio Video Over IP”.

Scott Barella is chief technology officer for Utah Scientific and a member of the board of directors for the Alliance for IP Media Solutions (AIMS), where he also serves as deputy chairman of the Technical Working Group.

Scott Barella is chief technology officer for Utah Scientific and a member of the board of directors for the Alliance for IP Media Solutions (AIMS), where he also serves as deputy chairman of the Technical Working Group.

You might also like...

Designing IP Broadcast Systems: Routing

IP networks are wonderfully flexible, but this flexibility can be the cause of much frustration, especially when broadcasters must decide on a network topology.

Audio For Broadcast: Cloud Based Audio

With several industry leading audio vendors demonstrating milestone product releases based on new technology at the 2024 NAB Show, the evolution of cloud-based audio took a significant step forward. In light of these developments the article below replaces previously published content…

Future Technologies: New Hardware Paradigms

As we continue our series of articles considering technologies of the near future and how they might transform how we think about broadcast, we consider the potential processing paradigm shift offered by GPU based processing.

Standards: Part 10 - Embedding And Multiplexing Streams

Audio visual content is constructed with several different media types. Simplest of all would be a single video and audio stream synchronized together. Additional complexity is commonplace. This requires careful synchronization with accurate timing control.

Designing IP Broadcast Systems: Why Can’t We Just Plug And Play?

Plug and play would be an ideal solution for IP broadcast workflows, however, this concept is not as straightforward as it may first seem.