Network Architecture for Studio Video Over IP

In an article in the August issue title “Ethernet Basics for Studio Video Over IP,” I gave an overview of studio video over IP (SVIP) in the uncompressed domain using Ethernet. That article covered Ethernet basics such as subnets, multicasting, virtual local area networks, bandwidth considerations, and the Open Systems Interconnection model. Now we’ll look at the essential components of elementary audio, video, and data streams in an Ethernet network, and I’ll present some approaches to network design meant to help you get started building an organized Ethernet architecture.

It Starts With Multicast

Unicast refers to a one-to-one transmission from one point in the network to another; that is, one sender and one receiver, each identified by a network address. Unicast is common, but this simple point-to-point connection isn’t practical in broadcast because, by their nature, broadcast plants rely on many different receivers or endpoints listening to the same source.

Multicast, on the other hand, is when a single video or audio transmitter connects to many different video or audio receivers. It’s ideal for a modern broadcast environment because it acts like an SDI router, connecting a single source to multiple destinations.

Making multicast work requires first connecting the transmitters to the receivers. There are many ways to make that connection, but for the sake of simplicity, the most important elements are:

A common subnet — The transmitter and receiver must be connected within the same subnet. Many refer to this foundational connection as the “source port,” since the multicast traffic being emitted into the network must originate from an IP port using an IP address.

IGMP — The protocol for joining the receiver to the transmitter is called Internet Group Management Protocol (IGMP), which exists in the Ethernet switch. The protocol enables communication between multiple receivers, not only notifying the transmitter that the switch will be joining receivers to it, but also making sure that the transmitter is aware that the switch is listening. What’s important with IGMP is that the Ethernet switch manages this connection. By monitoring which receivers are listening and breaking connection with receivers that are no longer accepting packets, the Ethernet switch can manage multicast traffic to network endpoints.

Packets: The Foundation of All Ethernet Data Streams

UDP for transport — The Universal Datagram Packet (UDP) is the most common means of transporting video in Ethernet environments. UDPs lack functions such as error correction, sequencing, duplicate elimination, flow control, and congestion control, but that simplicity is what makes them so common. Since they don’t require any direct connection management, they’re versatile for data such as essence video. In Ethernet, the maximum amount of data in a frame is approximately 1,500 bytes. With around 28 bytes reserved for headers, that leaves about 1,472 bytes for video data — just enough to avoid fragmentation.

UDPs have their downfall, however, because there’s no way to number them. When it comes to video, that’s problematic because video gets played out according to the number of frames per second. It’s critical that the frames are in the correct order, and UDPs are incapable of sequencing. Thankfully, there’s RTP.

RTP for sequencing — Real Time Protocol (RTP) is ideal for video because it handles the vital task of sequencing the packets. Even better, RTP packets are small enough that about seven of them can fit within a single UDP. Furthermore, RTP packets can be time-stamped, with the timecode existing as a separate data stream rather than a marker placed on the video.

PTP for time-stamping — Precision Time Protocol (PTP) is the method used to time-stamp the RTP packets, and while its use in the uncompressed-video-over-IP domain is a bit complex, the basic concept is straightforward. The idea is that the transmitting device reads the PTP packets on the network and stamps the RTP packets as they are emitted to the Ethernet network. In this manner, PTP packets serve as a synchronization source. PTP packets are very small compared to RTP packets that are carried by UDPs, but PTP and RTP packets exist together for different purposes.

PTP isn’t used in all video Ethernet transmissions; for instance, the SMPTE 2022 family uses clock rates based on a common SDI clock of 27 MHz. But PTP, as a pure Ethernet timing method, is the ideal synchronization centerpiece of the new SMPTE ST 2110 standard — itself built from the ground up on Ethernet.

All these packets in various combinations form audio, video, and ancillary data streams — some of the threads that flow through the fabric of a SMPTE ST 2110 Ethernet network. Understanding how those components work together goes a long way toward understanding new video- and audio-over-IP topologies — and how to approach your own SVIP network architecture.

Designing an SVIP Network

An Ethernet-based architecture is often described as a fabric. In that analogy, all the connections in an Ethernet design are woven together just as tiny threads of fiber are woven to create textiles. In the end, a fabric can take many different shapes, but its underlying structure is consistent and, if woven correctly, builds a strong composite. Likewise, Ethernet fabric is the composition of similar and dissimilar data flows that define a set of functions in and around themselves.

Keeping all the threads organized is the secret to any well-managed Ethernet fabric, and that requires planning. Figure 1 offers an example of how various components can be woven like threads to create a strong Ethernet fabric for a simple video network.

Figure 1.  While real-world networks might be much more complex, this example is useful for showing how an Ethernet network could be fundamentally constructed. Note that the Management subnet is at and all of the essence sources emit from the subnet.  The multicast now ‘rides’ within this subnet at 239.1.1.X:5000.

Figure 1. While real-world networks might be much more complex, this example is useful for showing how an Ethernet network could be fundamentally constructed. Note that the Management subnet is at and all of the essence sources emit from the subnet. The multicast now ‘rides’ within this subnet at 239.1.1.X:5000.

When creating your own Ethernet fabric, consider these approaches:

Manage video on 10 Gigabit switches and audio on 1 Gigabit switches. With SMPTE ST 2110, it’s possible to save bandwidth by transporting different types of data through different sizes of pipes. Video demands 10 Gb/s pipes, while audio and data require much less bandwidth (100 MB/s and 1 Gb/s, respectively). Keep in mind that 10 Gb/s ports can be rather expensive, so use bandwidth accordingly, and ensure that six HD signals can fit within one port when aggregating payloads.

Group signals according to workflow. Given that SDI workflows have tied routers together for years, keeping signals in groups according to workflow is something to consider. Take the example of two studios, in which the IP source port addresses and multicast addresses for each must be kept together logically in order to distinguish the studios from each other. Studio A might use a subnet of with for multicasts, but Studio B might use and This scheme gives us a general sense of addresses for both studios. It doesn’t mean we can’t route the signals between subnets or gather the signals into a common pool. It simply gives us some architectural context for where the signals belong in a common address scheme.

Keep device control and management separate. It’s important to keep device control and management addresses and their associated signaling away from the “business end” of the audio, video, and data essence flows. In other words, separate the device control network from the signal network. They should exist on separate 1 Gb/s Ethernet switches and be wired separately.

Plan for security. Good network design is imperative for tight security, so as a security measure, keep signals in manageable subnets so that signal flows are carefully connected. Every port and every connection within the Ethernet fabric must be accounted for.

In Summary

As professional video over IP continues to evolve, early network designers will face plenty of challenges. No matter how you approach your own network design, the most important thing is to keep it organized. As the network gains complexity, an organizational structure that is well-planned from the beginning will help make life easier for those who must manipulate all the video, audio, data, and control subnets.

Editor note: Part 1 of Mr. Barella’s Ethernet tutorial can be found here, “Ethernet Basics for Studio Video Over IP”.

Scott Barella is chief technology officer for Utah Scientific and a member of the board of directors for the Alliance for IP Media Solutions (AIMS), where he also serves as deputy chairman of the Technical Working Group.

Scott Barella is chief technology officer for Utah Scientific and a member of the board of directors for the Alliance for IP Media Solutions (AIMS), where he also serves as deputy chairman of the Technical Working Group.

You might also like...

CDN For Live And VOD

CDNs are much more than just high-speed links between Origins and ISP (Internet Service Provider) networks. Instead, they form a complete ecosystem of storage and processing, and they create new possibilities for highly efficient streaming at scale that will likely…

How Starlink Is Progressing As An Alternative To 5G

TV stations have mostly parked their satellite trucks and ENG vans in favor of mobile bi-directional wireless digital systems such as bonded cellular, wireless, and direct-to-modem wired internet connections. Is Starlink part of the future?

The Streaming Tsunami: Part 7 - How Immersive Experience Pushes Streaming Video Technology Forwards

We discuss the accelerating evolution of immersive media experiences & consumer technology, whether the mainstream media is keeping pace with the gamification of media consumption and the exponential growth in delivery capacity that will be required to support mass audience…

The Big Guide To OTT: Part 7 - Content Delivery Networks

Part 7 of The Big Guide To OTT is a set of three articles which examine the pivotal role of CDN’s, how they are evolving and how Open Caching aims to support broadcast grade streaming.

Scalable Dynamic Software For Broadcasters - The Book

Scalable Dynamic Software For Broadcasters is a free 88 page eBook containing a collection of 12 articles which give a detailed explanation of the principles, terminology and technology required to leverage microservices based, software only broadcast production infrastructure.