Video Over IP - Making It Work - Part 1

For the first time in the history of live television we can now abstract away the video, audio, and metadata streams from the underlying hardware. This innovation presents unprecedented opportunities empowering broadcasters to deliver flexibility, scalability, and highly efficient workflows.

SDI is a synchronous system and is intrinsically tied to the underlying hardware to maintain clock and data accuracy. Consequently, extracting the video from the stream is complex and restrictive as it requires special hardware interfaces dedicated to SDI distribution and unique to broadcasting.

SDI Lip-sync Errors

Maintaining SDI audio-video synchronization has its own challenges, especially when the audio is distributed independently of the SDI video. Even embedded audio can lose lip-sync when the signal is processed by a frame synchronizer. Audio and video are often separated by the synchronizer and if the correct delay has not been applied to the audio, then lip-sync errors can easily occur.

IP generally uses asynchronous networks, such as Ethernet, to distribute packetized data. This is true of video, audio, and metadata. Packets leaving a camera are transferred across an Ethernet fiber to a switch. At the point where packets leave the camera, the traditional SDI timing is lost.


Initially, SMPTE provided the ST2022 specification. ST2022-6 packetized blocks of SDI data and grouped them into UDP datagrams to allow distribution over an Ethernet network. ST2022-6 maintains the TRS (timing reference signal) information from the SDI network, so the packets are easily reconstructed by the receiver into the original video, audio, and metadata streams.

Although ST2022-6 is reliable and is in use in many installations throughout the world, it is wasteful of precious bandwidth as it maintained the line and field sync information and didn’t take full advantage of the opportunities IP networks offer. However, ST2022 was an effective and safe step into IP for broadcasters until ST2110 became available.


SMPTE’s ST2110 family of specifications was released in the fall of 2017 and is the real game-changer, providing full utilization of IP networks. When packetizing video frames, ST2110 removes the TRS so redundant timing data is no longer distributed in the packet. Active video is encapsulated in a datagram, which in turn is appended with a unique timestamp accurate to a few nanoseconds. Receivers can reconstruct the video frame using these timestamps.

Diagram 1 – PTP synchronizes broadcast equipment in IP systems replacing SPG’s in traditional SDI infrastructures.

Diagram 1 – PTP synchronizes broadcast equipment in IP systems replacing SPG’s in traditional SDI infrastructures.

To maintain the demanding levels of timing accuracy needed to ensure the optimal viewer experience, SMPTE have mandated the use of IEEE’s 1588 Precision Timing Protocol V2 (PTP). Industry has been using PTP for many years to synchronize machinery used on precision manufacturing production lines.

In a true IP system, PTP replaces the traditional black-and-burst sync-pulse-generator (SPG). But SPG’s are still needed in hybrid systems where SDI and IP co-exist, or to support legacy broadcast equipment.

PTP Replaces SPG’s

PTP timestamps can be thought of as a continuous counter that is incrementing every nanosecond. The absolute value is referenced to the epoch-time at midnight on 1st January 1970. In other words, the timestamp is the number of nanoseconds that have elapsed since the epoch. When a video, audio, or metadata packet is created, a PTP timestamp is appended to it, enabling the receiver to know exactly when the packet was created to allow it to reconstruct the frame of video, samples of audio, or reference the metadata.

Providing PTP timestamps creates unbelievable opportunity for broadcasters. No longer are we constrained by the underlying timing dictated by the synchronous networks of SDI, AES, and MADI. We no longer think in terms of video line and frame timing but can now think in terms of “events ” or “grains”. Each video frame or audio sample is a grain with a unique timestamp. It is possible to collect many different grains via different protocols or networks together using PTP as an absolute time of origin reference.

PTP uses a master-slave architecture to achieve a synchronous time reference throughout a LAN and WAN. All master clocks require an oscillator, but they vary in their accuracy. For example, a GPS referenced clock is more accurate than an NTP (network time protocol) referenced clock. To maintain synchronous timing, all slave clocks within the network will synchronize to the master.

No More SPG Changeovers

The Best Master Clock (BMC) algorithm, part of the PTP IEEE 1588 specification, negates the need for the A-B SPG changeover used by broadcasters in SDI systems. Several masters may exist in one network to give redundancy. But BMC allows them to identify the most accurate time source.

If we consider a network with two master clocks, master-A and master-B, and both GPS locked. The system administrator will need to set their priorities to be different. Master-A would be priority “0”, and Master-B would be priority “1”. If master-B is powered up before master-A, it will listen out for Announce messages on the network, as there won’t be any, it will assume itself to be Grand Master and start periodically sending its own Announce messages. The Announce message contains much information about the clock, including its accuracy and its priority, in this case “1”.

When master-A comes online it also starts to listen and receives master-B Announce messages. Master-A determines its priority is higher because it is “0” and starts to send its own Announce messages containing priority of “0”. Master-B receives this message, accepts master-A has a higher priority, goes into listen mode, and stops sending announce messages.

All slave clocks on the network will receive these messages and automatically select master-A as their new time source.

If master-A was to lose its GPS reference, then it’s accuracy would be degraded, and it would update its Announce message with this information (at this point it’s still Grand Master).  Master-B will be continually receiving the messages and determine that it now has a more accurate clock, thus assuming Grand Master status, and start sending Announce messages. Master-A would also receive these messages, acknowledge master-B is more accurate, stop sending Announce messages, and go into receive mode.

All slave clocks on the network will receive these messages and automatically select master-B as their new time source.

Use PTP Enabled Switches

Although PTP master clocks tend to be hardware devices, end points, such as camera’s, sound consoles, and playout servers, can all use software solutions to sync to the master. The quality of the end users network interface card (NIC) is important as the buffers within the NIC influence the delay of the PTP messages and affect how well the device locks to the PTP master.

To maintain PTP accuracy and keep timing jitter low, PTP-enabled switches must be used wherever possible. IEEE 1588 provides a system to update PTP messages with the delay incurred in the switch, thus enabling slave devices to take the time spent in the switch into consideration.

Not all switches are PTP aware. If they are used, PTP messages may be randomly delayed and the slave devices syncing to the master will experience clock jitter and offset from the master time. To rectify this, delay may need to be added to the received video, audio, or metadata streams, resulting in unacceptable delays for operational staff.

Improved Frame Accurate Metadata

High Dynamic Range (HDR) is taking the broadcasting industry by storm. But to fully provide an immersive experience for viewers, frame accurate metadata must be created, processed, and broadcast to viewers. Compliant TV’s use this frame-accurate data to dynamically configure their screens to provide the most optimal viewer experience possible.

In SDI systems, creating and maintaining frame-accurate metadata from a camera all the way through the production and transmission chain, is a complex and challenging task. A system would be awash with SDI embedders and de-embedders, along with multiple interface systems to convert, delay, and package the data accordingly. 

Diagram 2 – Using ST2110 allows broadcasters to process video, audio, and metadata independently of each other to provide flexible work flows.

Diagram 2 – Using ST2110 allows broadcasters to process video, audio, and metadata independently of each other to provide flexible work flows.

IP networks, specifically using ST2110, allow broadcasters to create frame-accurate metadata and maintain its timing relationship to video and audio throughout the studio, production, and transmission chains.

Thinking of video, audio, and metadata as event-timed data packets allows broadcasters to process the data streams anywhere they like. Leaving them open to new and more efficient working practices, on-prem and off-prem datacenters, and Cloud infrastructures suddenly become available to creatively process streams, assuming network pipes are fast enough.

IP Provides Efficient Workflows

Creating multilingual subtitles traditionally requires language specialists to work at a broadcaster’s studio. A Scandinavian broadcaster based in London might require native speakers from Sweden, Norway, and Denmark to work at the studio, clearly a massive expense.

Using ST2110-IP, the subtitling linguists could work from their respective homes with just a simple internet connection. A compressed proxy stream of the broadcast would be sent to them with the original PTP timestamps, and they could use a simple PC/MAC computer to subtitle the program. The software would be able to stamp each subtitle with the associated video using the PTP time stamp.

REMI (remote-integration model) “At-Home” or backhaul outside broadcasts are easily achieved due to the adoption of ST2110. Rather than send a complete crew and production team to a stadium for a sports event, broadcasters just dispatch the essential camera and sound operators. Video, audio, and any associated metadata is streamed back to the studio over IP circuits provided by Telco’s allowing the whole production to take place at the studio.

Diagram 3 – Using remote-OB techniques, broadcasters can now service many sports events from one studio to save on crew and deployment costs.

Diagram 3 – Using remote-OB techniques, broadcasters can now service many sports events from one studio to save on crew and deployment costs.

This has an obvious advantage in terms of accommodation and subsistence costs for crews. However, by switching the IP circuits from subsequent stadiums to the studio, one studio crew can cover several football matches or events.

Imaginative Workflows

Through the adoption of ST2110, broadcasters can unleash the power and opportunities IP networks provide for modern media distribution and broadcasting. Never in the history of television have we had the freedom to process, distribute, and monitor frame accurate video, audio, and metadata independently of each other and the broadcast infrastructure. Now, we are only limited by our imagination.

Part of a series supported by

You might also like...

Why AI Won’t Roll Out In Broadcasting As Quickly As You’d Think

We’ve all witnessed its phenomenal growth recently. The question is: how do we manage the process of adopting and adjusting to AI in the broadcasting industry? This article is more about our approach than specific examples of AI integration;…

Designing IP Broadcast Systems: Integrating Cloud Infrastructure

Connecting on-prem broadcast infrastructures to the public cloud leads to a hybrid system which requires reliable secure high value media exchange and delivery.

Video Quality: Part 1 - Video Quality Faces New Challenges In Generative AI Era

In this first in a new series about Video Quality, we look at how the continuing proliferation of User Generated Content has brought new challenges for video quality assurance, with AI in turn helping address some of them. But new…

Minimizing OTT Churn Rates Through Viewer Engagement

A D2C streaming service requires an understanding of satisfaction with the service – the quality of it, the ease of use, the style of use – which requires the right technology and a focused information-gathering approach.

Production Control Room Tools At NAB 2024

As we approach the 2024 NAB Show we discuss the increasing demands placed on production control rooms and their crew, and the technologies coming to market in this key area of live broadcast production.