Video Over IP - Making It Work - Part 3

Point to point connections dedicated to delivering video and audio signals have dominated the broadcast industry since the 1930’s. But migrating to IP is compelling engineers to think differently about packetized signal distribution. In this article we investigate the potential sources of congestion and the effects of buffering.

Latency has been slowly creeping into broadcasting since the first frame-synchronizers were used to time remote cameras into studios. If the audio was correctly delayed and the lip-sync relationship maintained, broadcasters weren’t too concerned about latency as the times involved were very small. But migrating to IP has delivered many new challenges and we must be vigilant when dealing with latency and jitter.

By the very nature of IP, data is divided into packets and moved through a network independently of any other packet. Many devices in a network will buffer data and as they do, they introduce unpredictable delay resulting in jitter and potential loss of data.

Not all devices within a network will operate at the same Ethernet speeds. A camera may use a 10GbE connection, an audio processor may use 100Base-T, and an IP Multiviewer might use a 100GbE. Assuming we’re using a centralized switch topology, any 100Base-T devices will need to be converted to fiber to facilitate connection to the Ethernet switch.

Changing Data Rates

If we assume a scenario where a well-behaved camera with a 10GbE connection is creating an evenly gapped video, audio, and metadata stream, then converting the HDR metadata content with an average data-rate of 5Mbits/s from the 10GbE to 100Base-T is seemingly straightforward as solutions such as media-converters provide this. Effectively, the media-converter is providing two functions, its gearing the clock speed from 10Gbit/s to 100Mbit/s and changing the physical layer from fiber to twisted copper pair.

To change clock speed, the complete received Ethernet frame must be loaded into a buffer before it can be clocked out at the lower rate. And the opposite is true when converting from 100Base-T to 10GbE.

The Detail is in the Burstiness

All buffers are a fixed length and if packets from the 10GbE are being written into the buffer faster than the data is being read out for the 100Base-T connection, then buffer overrun occurs. That is, the frame received has nowhere to go, so it gets dropped.

Average measurements in networks do not tell us much of what is going on. The real area of interest is in the tails of the distribution bell curve. 

Diagram 1 – Each green and blue block shows the relative time duration to send consecutive frames with the same average data rate on 10GbE and 100Base-T connections. A media-converter will receive the frame into a buffer at 10Gbit/s, and then send it out at 100Mbit/s assuming the long-term average is less than 100Mbit/s. Evenly gapped data is easily converted to a 100Base-T connection from 10GbE without packet loss. But burst data shows frames 2 and 3 are lost on the 100Base-T connection if a small buffer is used. If a large buffer is used then all frames from the 10GbE connection will be correctly sent on the 100Base-T connection, assuming the long-term average is less than the capacity of the link.

Diagram 1 – Each green and blue block shows the relative time duration to send consecutive frames with the same average data rate on 10GbE and 100Base-T connections. A media-converter will receive the frame into a buffer at 10Gbit/s, and then send it out at 100Mbit/s assuming the long-term average is less than 100Mbit/s. Evenly gapped data is easily converted to a 100Base-T connection from 10GbE without packet loss. But burst data shows frames 2 and 3 are lost on the 100Base-T connection if a small buffer is used. If a large buffer is used then all frames from the 10GbE connection will be correctly sent on the 100Base-T connection, assuming the long-term average is less than the capacity of the link.

Although the overall data-rate may be within the specification of the PC, media converter, and 100Base-T Ethernet link, there could be lost frames due to burstiness and jitter as the frames may not be able to be processed quickly enough.

Measuring the burstiness of a connection is notoriously difficult and must be achieved using a hardware device. Network Interface Cards (NIC’s) will buffer data as soon as it arrives on the physical connection, removing the temporal relationship between the frame and wire. A hardware monitoring unit will tag the frame with an accurate clock as it is received off-the-wire and before it is written to the buffer. The time-tag can be used to determine the exact burstiness of the data and provide some meaningful buffer management.

Buffers to the Rescue

IT solutions regularly deal with burstiness and tend to average out the data using buffers. In the scenario described in diagram 1, a buffer that can store at least three frames would have resulted in no lost packets. However, latency would increase.

SDI, AES, and analogue switching matrices have no buffering, thus any input can be routed to any output with only a few nanoseconds of processing and propagation delay. But this is not always the case with Ethernet switches. In IT terms, the non-blocking switch is defined as a switch-fabric capable of handling the theoretical total of all ports, such that any routing request to any free output port can be established successfully without interfering with other traffic. Smaller and budget switches do not always facilitate non-blocking.

Delivery Without Collision

Switch-fabric, or just “fabric”, is the function within the Ethernet switch used to transfer frames from the ingress port to the egress port. The fundamental role of the fabric is to achieve switching without collisions or loss of frames.

The buffer management within a switch is one of the principle factors that governs frame throughput. Ethernet connections fundamentally differ from video and audio as an Ethernet circuit may be carrying more than one media-essence stream, whereas a video or audio circuit is usually carrying just one.

For example, the 10GbE IP stream leaving the camera will have frames containing video, audio, and metadata. The video output might consist of multiple HD and SD feeds for program and monitoring. Return video, talkback, tally, and lens control streams will all be moving in the opposite direction simultaneously.

No Buffers is Best

Ideally, switches would not use buffers at all and would just be one big silicon block of shared memory on a single chip supporting a thousand simultaneous serial connections. Commercially this is impractical as current technology only allows 100 – 200 simultaneous serial connections to a chip. Although building a device may be possible, the manufacturing yield rate would be prohibitively low and the expense would be enormous.

Switches regularly move frames from many ingress ports to one egress port. A quad-split viewer would require multicast feeds from four different cameras, all presenting on different ingress ports on the switch. The fabric must individually move each frame from ingress ports 1 to 4 for each camera, to the same egress port for the quad-split viewer. If all the frames simultaneously arrive at ports 1 to 4, the switch must store three of them and send each frame in turn, through the fabric, to egress output 1, otherwise frames would be dropped.

Diagram 2 – Cameras 1 to 4 present their frames to ingress ports 1 to 4. Each of the frame streams is being multiplexed onto the egress port 10. Many frames arrive at the same time, or close to each other, and if the egress port P10 was not empty, frames would be dropped. Buffers B1 to B4 provide temporary storage for the frame streams from the cameras so that they may be correctly ordered and presented as a single stream to the Quad-Split on P10.

Diagram 2 – Cameras 1 to 4 present their frames to ingress ports 1 to 4. Each of the frame streams is being multiplexed onto the egress port 10. Many frames arrive at the same time, or close to each other, and if the egress port P10 was not empty, frames would be dropped. Buffers B1 to B4 provide temporary storage for the frame streams from the cameras so that they may be correctly ordered and presented as a single stream to the Quad-Split on P10.

To Understand why switches are blocking or non-blocking requires a better understanding of buffer strategies.

Placing input buffers on a switch seem the most obvious choice. If a FIFO (First in First Out) is associated with each ingress port, then frames presented to the port will be immediately written to the FIFO memory. Assuming the long-term average of the link was not breached, and the egress port capacity was not breached, then the FIFO would be able to deal with short term burstiness and not drop any packets.

Algorithms in the switch read the MAC destination address of the frame in the FIFO and determine which egress port to move the frame to through the fabric.

Head of Line Blocking

A fundamental problem occurs with this method. The Ethernet stream presented to the ingress port will contain frames with destination MAC’s for different ports. For example, each of the cameras on ports 1 to 4 would also contain audio which might be destined for port 11 going to the sound console. These frames will be queued behind the video frames and must wait until the video frames have been moved to their respective egress ports before the audio frames can be moved. If the video egress port is operating at a high capacity, then the audio frames will be blocked.

This phenomenon is referred to as Head-of-Line-Blocking (HLOB). FIFO’s are cheap to implement and used in small switches, and many applications in IT do not require the expense of non-blocking architectures as they rely on higher protocols such as TCP to compensate for any dropped packets.

Diagram 3 – Ports 1 to 4 are receiving Ethernet frames for video (Vn) from cameras 1 to 4 with associated HDR meta-data (Mn). As Buffers 1 to 4 are FIFO’s, M1 and M4 cannot be moved to P20, and hence processed by the HDR processor, until frames V1 and V4 have been moved to P10 for the quad split. Even though P20 is empty, and the link has the capacity to send frames to the HDR processor, it cannot, until V1 and V4 have been moved. In this scenario, M1 and M4 frames are blocked. HDR processing becomes dependent on the Quad-Split monitoring, this is unacceptable behavior and will result in HDR to video sync errors or dropped packets.

Diagram 3 – Ports 1 to 4 are receiving Ethernet frames for video (Vn) from cameras 1 to 4 with associated HDR meta-data (Mn). As Buffers 1 to 4 are FIFO’s, M1 and M4 cannot be moved to P20, and hence processed by the HDR processor, until frames V1 and V4 have been moved to P10 for the quad split. Even though P20 is empty, and the link has the capacity to send frames to the HDR processor, it cannot, until V1 and V4 have been moved. In this scenario, M1 and M4 frames are blocked. HDR processing becomes dependent on the Quad-Split monitoring, this is unacceptable behavior and will result in HDR to video sync errors or dropped packets.

Broadcasters use UDP as they cannot afford the latency that TCP protocols present. In using UDP we’ve removed the protection TCP provides, especially as switches using ingress FIFO’s drop packets during times of high congestion. Non-blocking switches use much more advanced buffer management solutions such as VOQ (Virtual Output Queues).

Intelligent Buffering

VOQ is an intelligent memory buffer. Each ingress port will have one VOQ buffer for each of its egress ports. For example, if the switch has 16 ports, each ingress part of the port will have 16 VOQ buffers associated with it. When a frame is presented to the ingress port, the VOQ algorithm determines the destination MAC of the frame and moves it into the VOQ buffer for that frame. When the egress port becomes empty, the VOQ algorithm will decide which of the ingress-VOQ buffers to move the next frame from.

Due to the complexity and high bandwidth storage management needed to make VOQ work efficiently, it is only found in expensive, low latency, high end switches. VOQ also removes HLOB while allowing network administrators the option of applying QoS (Quality of Service). VOQ algorithms prioritize certain frames by applying priority weighting to buffers. This could be useful for prioritizing PTP traffic to and from cameras.

Timing Constraints

Calculating network capacities is not as straight forward as it may first seem. Burstiness can cause excessive packet jitter and even dropped packets. ST2110-20 provides tight timing constraints for burstiness and if they are not obeyed, downstream equipment may not be able to correctly display video and audio. 

Part of a series supported by

You might also like...