Designing IP Broadcast Systems: Remote Control

Why mixing video and audio UDP/IP streams alongside time sensitive TCP/IP flows can cause many challenges for remote control applications such as a camera OCP, as the switches may be configured to prioritize the UDP feeds, or vice versa.

All IP flows are not the same, as protocols such as TCP and UDP may negatively influence each other, resulting in video and audio breakup and sporadic control.

Although it may not seem immediately obvious, IP streams have attributes that influence how they traverse networks, which in turn has an impact on the quality of service.

Video and audio signals consume massive and continuous bandwidth: a baseband video signal requires a continuous datarate of 3Gb/s for progressive HD. IT networks are used to delivering large bursts of data, but generally not for long periods of time. The asynchronous nature of terminal devices, such as servers, storage, and database applications, means they regularly send and receive short bursts of data, and the networks are tuned to facilitate this.

Few applications, if any, in datacenter type computing environments make synchronous demands on the servers and underlying network. There may well be a demand for low latency, such as in high-frequency trading, but the need to simultaneously impose low latency and time invariant parameters onto both the network and the terminal devices simultaneously seems to be wholly exclusive to broadcasting. This is mainly due to the historic method of operation as video and audio signals are accurately sampled and the time relationship between each sample between each media stream must be maintained otherwise the pictures will stutter, and the audio will break up manifesting a cacophony of undesirable squeaks and pops.

Computer networks are highly efficient as they take advantage of the gaps between the data bursts to employ a method of statistical multiplexing. Generally, a 10Gbps ethernet link from a server will not consume 10Gbps every second. Instead, it will exhibit peaks and troughs where the average will be significantly lower than 10Gbps, but the transients may well peak above 10Gbps. This is entirely different from broadcasting where a 3G-SDI video circuit will always utilize all the data bandwidth (2.97Gbps for 3G-SDI), even if there is no video being transported. This makes SDI and AES inefficient in terms of datarate utilization but has the major advantage of guaranteed data integrity and predictable latency.

It’s worth delving a little deeper into what we mean when we speak of computer network efficiency. Figure 1 shows a typical layer-2 switch with ten servers connected to it, each with a 10Gbps link connection. In this example, the switch has a 40Gbps port connecting it to the next switch or router in the network. If all ten servers simultaneously generate data streams of 10Gbps that are sent to the 40Gbps port, then the total data bandwidth required would be 10 * 10Gbps = 100Gbps. This is far more than the capability of the 40Gbps link.

Figure 1 –  Computer networks take advantage of the assumption that statistical peaks and troughs generated by terminal devices such as servers exist and are therefore able to moderate the capacity of link connections. This assumption doesn’t lend itself well to broadcasting as uncompressed video and audio streams are continuous.

Figure 1 – Computer networks take advantage of the assumption that statistical peaks and troughs generated by terminal devices such as servers exist and are therefore able to moderate the capacity of link connections. This assumption doesn’t lend itself well to broadcasting as uncompressed video and audio streams are continuous.

The obvious question is why not just increase the bandwidth of the 40Gbps link to 100Gbps? This would certainly allow all the servers to simultaneously send 10Gbps of data to the port. However, the major challenge with this solution is cost as a switch consisting of one or multiple 100Gbps ports is significantly more expensive than a 40Gbps switch. This is not only in the capital outlay, but also in providing technical support, power, and air conditioning. Furthermore, each server is unlikely to be continuously sending and receiving 10Gbps, so employing a switch with 100Gbps is an unnecessary cost. Statistically, each switch will be sending and receiving much lower average data rates than the link capacity they are attached to. The genius of the network architect is not just knowing how to connect the servers and configure the switch or router, but also knowing what capacity to specify for the port, in this hypothetical case, 40Gbps.

That said, it is entirely possible that each of the ten servers simultaneously and instantaneously generates 10Gbps of data for 500ms, thus resulting in a total data rate requirement of 50Gbps on the egress link port, and in this example the port would be overloaded. Switch and router manufacturers deal with this by providing buffers in their devices. Depending on the design, these may well be on the input side of the port, the output, or a combination of the two. Therefore, if all ten servers burst data at the same time, then the buffers would temporarily hold the short bursts of data and then write them to the output port in a timely fashion, thus providing statistical multiplexing and reducing the risk of packet loss.

This leads to another question, how does the network architect guarantee that 40Gbps is the correct specification for the link in question? The simple answer is that they only need to use approximations. The principal reason is that Transmission Control Protocol (TCP) resends any lost packets thus generating an error free connection. In essence, the network architects assume some IP packets will be lost, either due to data corruption in the physical medium, or the switch buffers becoming overflowed. If a switch buffer becomes full, it deals with the situation by just dropping all subsequent input packets until there is sufficient space in the buffer to resume writing.

When we speak of non-blocking switches in broadcast then we’re referring to the fact that the output port and link, and the switch memory have sufficient capacity to accept all the IP packets that are forwarded to it. And this could be potentially huge as a link forwarding one hundred 3G HD video feeds will need a capacity of at least 300Gbps. 

Although TCP is ubiquitous within computer systems, it guarantees data integrity at the expense of latency. Not only that, but the latency is variable and unpredictable. Entire books have been written on TCP so we’re unlikely to cover it in a short article, but two things need to be understood; networks are expected to lose IP packets, and TCP is the method used to resend lost packets, albeit with variable and unpredictable latency. Hence the reason we don’t like using TCP in broadcast studios and therefore use UDP/IP (User Datagram Protocol) for video and audio media flows.

UDP builds on top of IP to provide more granular routing information within servers such as the port numbering system, the IP address will target a specific server or NIC, but the UDP section allows individual software applications running on the server to be targeted for reception of the IP packet. There is no re-send strategy built into UDP so if a packet is dropped in a switch buffer, it is lost forever, essentially, UDP is fire-and-forget. This is another reason studio broadcast networks are so expensive as broadcasters must use the best switches and routers possible to reduce the risk of dropped packets.

Video and audio streams tend to use UDP as their major advantage is that they exhibit very low and generally predictable latency. As there is no re-send strategy, the receiver doesn’t need to signal re-transmits to the sender or re-request lost packets.

Remote control applications such as a camera OCP, where the control panel used by the shading engineer will send messages to the camera to open and close the iris, use messaging protocols where latency is a key issue. Within the context of the studio this is usually not an issue as the network will need to be so over-engineered that a few control messages flying around has little influence on the whole network. However, when connecting an OCP from a central control room to a remote camera many miles away, the signals will generally be transported using TCP. This guarantees data integrity, which is extremely important when minute changes in the iris are being actioned, but at the expense of latency. In other words, the response of the OCP iris control may be sporadic, which is obviously unacceptable.

The main challenge broadcasters face with remote operations is that they often employ third party networks to connect the remote studio to the centralized control facility. Even with the best SLAs, the devil will be in the detail and if a circuit is guaranteed to be 10Gbps then note should be taken of what period of time this actually applies to. For example, it may be an average of 10Gbps over ten seconds where the available data may peak to 80Gbps for two seconds but be limited to 20Gbps for the remaining eight seconds. The data rate is indeed 10Gbps averaged over ten seconds but drops to 2.5Gbps for eight seconds out of the ten. As video and audio streams demand continuous data rates, any drops in bandwidth could be catastrophic, and this is one of the reasons broadcasters use CDNs.

Mixing video and audio UDP/IP streams alongside time sensitive TCP/IP flows can cause many challenges as the switches may be configured to prioritize the UDP feeds, or vice versa. It’s entirely possible that the switches may be configured to prioritize TCP/IP flows over UDP/IP. Therefore, when the network link becomes full, it’s possible that moving the OCP iris control could make the picture break up. In other words, the TCP and UDP flows may negatively impact each other and as networks tend to be black boxes, knowing how they’re configured is not easy to determine.

One solution to this conundrum is to route the video and audio UDP flows over entirely different networks to the remote-control TCP flows so that they have little chance of interacting with each other. But this assumes that the network provider isn’t using any common links and that they are truly providing diverse connectivity – this cannot be assumed.

Part of a series supported by

You might also like...

The Resolution Revolution

We can now capture video in much higher resolutions than we can transmit, distribute and display. But should we?

Microphones: Part 3 - Human Auditory System

To get the best out of a microphone it is important to understand how it differs from the human ear.

HDR Picture Fundamentals: Camera Technology

Understanding the terminology and technical theory of camera sensors & lenses is a key element of specifying systems to meet the consumer desire for High Dynamic Range.

IP Security For Broadcasters: Part 2 - The Problem To Be Solved

By assuming that IP must be made secure, we run the risk of missing a more fundamental question that is often overlooked: why is IP so insecure?

Standards: Part 22 - Inside AIFF Files

Compared with other popular standards in use, AIFF is ancient. The core functionality was stabilized over 30 years ago and remains unchanged.