BCE Going Deeper - Part 3 - Debugging IP

At the start of 2013, BCE at RTL City was a hole in Luxembourg’s ground and in less than four years they were on air broadcasting 35 different channels across Europe and Singapore. Costas Colombus is BCE’s Technology Projects and Support Director and gave The Broadcast Bridge a unique insight into how they made this mammoth installation work, including describing the issues and how they overcame them along the way.

In this third article in the series we look at the challenges that occur in IP networks, how to detect them, and the network tools needed to fix them.

BCE decided to use a fully redundant system consisting of two routers; Juniper and Arista. During the installation phase, Costas detected problems with the network common to both IP routers; randomly, video feeds would break up with no obvious reason.

To introduce video into the network, BCE’s principle technology contractors SAM, provided video gateways, devices to convert HD-SDI to ST-2022 IP and aggregate many sources onto one 40GB/s fiber. Video disturbances in one of the gateway sources could manifest themselves as dropped packets in the IP domain.

Data mining tools from the IT industry helped log events and hone in on the source of the problems. BCE chose Paessler’s PRTG network monitoring solution to give a deep analysis of the network including lost packets and data rates. Network bandwidth monitoring is achieved using remote hardware devices, or extracting statistics using SNMP and sFlow.

But why is flow monitoring and control so important? Intuitively, we might think that adding a high-speed link between two networks would improve transmission speeds. However, making every link in a network as fast as possible would increase costs disproportionately and negate our wins using Commercial off-the-shelf (COTs) products. And the Braess Paradox demonstrates that adding high speed links between routers doesn’t always increase speeds and could even decrease them. This is counterintuitive and is the subject of a later article.

Diagram 1 – These two diagrams are cross-section magnified photographs of a fiber, the one on the left is clean and the one on the right is the same channel after it’s been touched very quickly.

Arista and Juniper utilize sFlow to continuously sample and monitor application level traffic at wire-speed simultaneously on all interfaces of their routers. It makes the distinction of monitoring between on the wire, and at the protocol level - an important distinction as the two can often give different measurements depending on the protocol being used. For example, UDP will give much faster protocol speeds then TCP, even if the wire-speed is the same.

The sFlow specification claims that sampling and monitoring does not impact on router performance. Oscilloscopes use high impedance probes to monitor audio and video systems, so we can be sure the signal is not being loaded or influenced by the measuring device. The same assumption cannot always be made in networks as network interface cards on servers and PC’s buffer incoming and outgoing packets resulting in critical timing information being lost.

BCE developed their own monitoring software to present the information in a coherent form, different vendors provide their own monitoring solutions and a common system was needed to assist BCE’s maintenance teams in diagnosing issues quickly and proactively. The amount of sampled data available is overwhelming and identifying which attributes to monitor is a full-time job. False negatives waste precious time and can be severely detrimental to the smooth operation of a network.

The deep data mining and analysis used by BCE is often only found in high performance systems such as those used by Google and Amazon due to the data speeds involved and the level of understanding needed by the engineering teams. It also allows automated systems to intelligently detect patterns of behavior that are inconsistent with normal operation and flag potential issues to maintenance teams before a failure occurs. BCE’s network operations centers have fine-tuned this requirement and monitor video, audio and metadata flows 24/7, and continuously record data for later analysis should an intermittent problem develop.

Using these network monitoring tools connected to real time wire-speed router monitoring systems in Arista and Juniper, BCE’s maintenance teams were able to record and analyze network speed measurements and reported errors. They noticed tens of thousands of network packet losses each day on many of the router QSFP ports.

The amount of historic data gathered allowed engineers to focus on the QSFP’s and they proved switching them from white-label units to the manufacturer approved units reduced the dropped packets from tens of thousands a day, to tens of packets a day, and sometimes even zero.

Diagram 2 – BCE’s bespoke software showing video over IP monitoring.

BCE found that another contributing factor to packet loss can be dirty fiber interfaces, as highlighted in The Broadcast Bridge Essential Guide on Fiber Optics in Production. Dust and grit are the enemies of fiber and this was particularly evident for BCE as their building work continued. Even though construction was taking place far away from the fiber installation, the smallest amount of dust could contaminate the interface. To rectify this, BCE dedicated one specific team to clean and make fiber connections to guarantee the consistency of work.

Historically, engineers working in the SDI domain would only need to deal with the physical and application layers of the ISO 7-Layer model, but as we migrate to IP the need to understand the other five layers soon becomes apparent.

In this series of three articles - Debugging IP, Cable, Standards and ITIL, and Choosing Routers, we’ve witnessed at first hand the importance of working closely with IT engineers to make IP-media systems operate effectively and reliably. One person cannot possibly hope to understand all aspects of the ISO 7-layer model to the depth needed for IP migration, so collaboration between broadcast and IT engineers is key to solving problems, even those manufacturers are unaware of. 

Let us know what you think…

Log-in or Register for free to post comments…

You might also like...

Essential Guide:  OTT Monitoring Uncovered

OTT distribution is worlds apart from traditional unidirectional broadcasting in terms of its fundamental operation and viewing preferences. The internet is a rapidly expanding collection of service providers, many in direct competition, transferring broadcaster video and audio streams alongside many…

Data Recording and Transmission: The Hard Disk Drive

We call them hard disks to distinguish them from floppy disks. As the latter have practically died out as a result of progress in solid-state storage such as flash memory, it is probably not necessary to specify that disks are…

OTT - What and Where to Monitor – Part 3

In the last two articles in this series we looked at why we need to monitor in OTT. Then, through analysing a typical OTT distribution chain, we sought to understand where the technical points of demarcation and challenges arise. In…

Why We Need OTT Monitoring – Part 2

In the previous article in this series, “Understanding OTT Systems”, we looked at the fundamental differences between unidirectional broadcast and OTT delivery. We investigated the complexity of OTT delivery and observed an insight into the multi-service provider silo culture. In thi…

Data Recording and Transmission: RF

In part 8 of the series “Data transmission and storage”, consultant John Watkinson looks at some of the intricacies of RF transmission.