Video Over IP - Making It Work - Part 2

Timing is the most fundamental aspect of broadcast television. From the early days of tube cameras, to modern OLED displays, timing continues to play a crucial role for maintaining an optimal viewer experience. Picture stutter and breakup are just a few of the symptoms of synchronization and timing errors.

Broadcast systems have relied on synchronization since the first television services were transmitted in the 1930’s. Line, field, and frame pulses were used to keep cathode ray television screens synchronous with the image created by the scanning tube camera.

Accurate lip-sync was assumed as there were no appreciable delays created by processing equipment to destroy the timing relationship. But this all changed with the advent of signal digitization.

Processing Lip-Sync Errors

In the early days of digital processing, the adoption of frame synchronizers witnessed the first lip-sync errors. However, they were confined to the locality of the frame synchronizer so could be easily fixed using digital audio delay. The same lip-sync errors followed compression, with the amount of delay just increasing.

Where video and audio were distributed together, embedded SDI was used to maintain the video-audio timing relationship. The underlying technology of all video and audio digital distribution at this time relied on bit-clocks being embedded in the transport signal itself, thus establishing and maintaining sample synchronization between the sender and receiver.

Diagram 1 – if the monitor is not phase and frequency locked to the camera, then frames will either be lost or duplicated. If excessive jitter occurs, both loss and duplication will happen causing the picture to be unstable.

Packet switched networks, such as IP, destroy this relationship. It’s the fundamental price we pay for the flexible, scalable, and highly efficient workflows that IP offers.

Unlocked Oscillators Cause Instability

If the monitor oscillator was running slightly fast with respect to the camera, then there would not be enough video samples entering the monitor and the picture displayed would become unstable. The opposite is true if the monitors oscillator was running slow; there would be too many samples and the monitor would not be able to display them all.

A difference of just 30Hz on an SDI connection between the camera and monitor oscillators could result in one sample per frame being lost. If this occurred during frame blanking, the picture would become unstable and unusable.

Relying on free-running oscillators is guaranteed to make broadcast systems catastrophically fail.

Enemy Jitter

Clock jitter is the enemy of distributed synchronous systems. Short-term jitter causes the frequency of the slave oscillator to change quickly, and long-term jitter causes drift.

Digital master-slave clocks are analogous to phase locked loop oscillators used to synchronize color sub carrier frequencies in NTSC and PAL analogue transmissions. A feedback mechanism compares the television oscillator and the incoming signal from the broadcaster. This creates an error voltage which is used to vary the frequency-controlled oscillator. Filters dampen oscillations in the error voltage to keep the system stable.

Although digital filters do not change the frequency of their local oscillator, they do create a periodic pulse to form the slave clock synchronized to the master. Counters vary the mark-to-space ratio of the slave clock to correct its phase and frequency to obtain convergence.

Diagram 2 – digital oscillators achieve lock by changing counter values to provide a cyclical waveform. Feedback filters are used to dampen the rate of change of the values so that the system is stable and converges to the desired frequency. This diagram provides an NTSC frame pulse, changing each of the mark-space counter values to 720,000 will provide a PAL frame pulse at 25Hz.

Digital filters are used extensively in feedback mechanisms for slave clocks to reduce both long-term and short-term jitter. But there is a price to pay; if the filter time constants are too long then the slave will take considerable time to sync-up and may never reach convergence. And if the filter time constants are too short, then the slave clocks will behave erratically resulting in unstable pictures and distorted sound.

Buffering Masks Jitter

One more tool is available to help overcome jittery synchronization and that is buffering. Writing input samples into a buffer gives some breathing space for the output side to read the data at a constant rate. If the long-term frequencies are synchronized and correct, the sender and receiver will have exchanged video and audio data happily. However, if the buffer windows are too long, the video, audio, and metadata will be significantly delayed causing lip-sync issues, or unacceptable operational delay.

SMPTE adopted the IEEE 1588-2008 standard to synchronize clocks between devices on IP networks for their ST-2110 specification. Otherwise known as PTP (Precision Time Protocol), SMPTE borrowed this standard from industry as they had already been grappling with the issue of precision synchronization for many years.

Epoch is Key

Using well designed Ethernet networks, it is possible to achieve sub-microsecond accuracy using PTP. Coherent-phase alignment between devices is achieved using the Epoch time; a unique timestamp value available every nanosecond, referenced to the beginning of the Epoch.

Initially, the accuracy of the PTP network relies on an ordinary clock assuming the role of the Grand Master clock. The highest accuracy clocks available are atomic maintaining an accuracy of 1nS, but these are generally not available to most broadcasters and so they lock-to GPS satellites with onboard atomic clocks. The most reliable PTP-GPS clocks can achieve accuracies of < +/- 40nS.

The PTP Grand Master transmits “sync” messages approximately once every second. The send frequency varies depending on the type of network used and the speed of pull-in required in the slave devices. These messages include much data, but critically include the number of seconds and nanoseconds passed since the start of the Epoch.

Diagram 3 – PTP clock synchronization.

SMPTE’s ST 2059-1:2015 “Generation and Alignment of Interface Signals to the SMPTE Epoch”, defines the reference to be midnight, January 1st 1970, International Atomic Time (TAI). Audio samples, video frames, and metadata is appended with a timestamp value that represents the number of seconds and nanoseconds that have elapsed since this time.

Using this elapsed time, we can determine the day, month, year, hour, minute, second, and fraction of a second, anywhere from January 1st 1970 to now, with an accuracy dependent on our Grand Master.

Low Jitter for High Accuracy

The aim of network synchronization is to make the Grand Master and all attached slave clocks contain values that describe the same number of seconds and nanoseconds elapsed since the Epoch at any one time. The accuracy with which this can be determined is based on the resolution of the master and slave counters, and the amount of jitter the network introduces into the timing messages.

Although simply sending time messages from the master to the slaves will provide some synchronization, it is a naïve approach as the distance and delay between the slave and master is unknown. To correct this, three more messages are exchanged between the Grand Master and slave. Diagram 3 demonstrates how the protocol works.

When a slave first synchronizes to a master, its time values will be wildly different to the master, and a period of synchronization starts. Convergence may take any time between a few seconds to ten minutes, depending on the configuration adopted and network design.

The “1588 Default profile”, designed for general applications, uses slower message rates and might lead to long lock times. For the broadcast industry, devices need to lock to PTP quicker, to facilitate fast exchanging devices in a live production (e.g. camera). AES67 Media Profile and SMPTE 2059-2 profile uses faster message rates between Master and slave, enabling lock times of a few seconds.

Software PTP Causes Jitter

If the PTP stack is implemented completely in software. Without hardware assisted network interfaces, operating systems, IP stacks, and CPU response times all conspire to reduce the accuracy of the timestamps in both the master and the slave. Consequently, for broadcast applications, we will always use a hardware assisted PTP generator. If accurate time-of-day timecode lock is required or the master and slaves are geographically separated, the PTP generator will be locked to a GPS source.

Accurate slave devices, such as camera’s, should provide hardware circuits in the Ethernet network interface card to extract and insert timestamps in the UDP/IP datagrams as close as possible to the physical layer of the network. Using this method removes timing errors and jitter created by the operating system, IP stack, and CPU response time.

Certain COT’s Network interface cards also offer hardware-based timestamping for usage in server chassis, offering accurate synchronization for software applications running on standard servers.

Networks Must be Symmetrical

As diagram 3 demonstrates, networks must be symmetrical for the protocol to work correctly. That is, the time taken to send an IP packet from master to slave, must be the same as the time taken to send it back.

This is a reasonable assumption in well-designed Ethernet networks, however, this assumption is not necessarily valid in resilient routed networks as the datagrams may take different send and receive routes. The Delay_Req and Delay_Resp messages are averaged to determine the overall network delay. If one is longer than the other, the average will be skewed, possibly randomly, and excessive jitter will occur at the slave device.

In a broadcast network, there may be hundreds of slave devices, such as camera’s and switchers, all receiving their timing information from one master. From diagram 3, we can see that each slave must send a Delay_Req message back to the master in response to the Sync and Follow_Up messages. This has the potential to overload the Grand Master CPU causing time inaccuracy and jitter.

Additionally, standard switches might queue PTP messages to prioritize other messages, resulting in a less-than-ideal delay measurement.

QoS

PTP messages need to be prioritized over all other messages, as correct timing is crucial for any other media (Audio, video, control) to function correctly. Quality of service queues configured in the network devices help enforce this.

Usually, switches capable of IEEE 802.1p can sort all incoming packets into a maximum of seven queues before leaving the switch on another port. Like boarding a plane with priority permission, packets with Class of Service (COS) markings can traverse the switch faster than others.

In IP Media networks, DSCP (Differentiated Service Code Point) markers in the IP Header can be used to mark PTP packets created in Master or slave. Switches will then sort up to 64 different DSCP tags into the 7 queues offered in most switches.

Diagram 4 – Mapping of DSCP to COS according to recommendations in AES67:2018. PTP Traffic marked with DSCP 46 will use a higher queue (e.g. 5) than the Audio or video traffic, marked with 34.

Using higher DSCP tags for PTP than for Audio helps achieving accurate synchronization for audio networks, as switch introduced jitter is not extremely high, as the bandwidth utilization of Audio over IP is not significant

Boundary Clocks Keep Jitter Low

To overcome network jitter in Video over IP applications, where data rates of multiple Gigabit per seconds occur, boundary clocks are used in the network to distribute the message load away from the Grand Master. PTP allows each port to act as either a master or slave. One port on the boundary clock will be configured as a slave type, connected to the Grand Master, and the second port will become the master for all the slave devices connected to it.

The boundary clock must internally synchronize its own clock to the Grand Master to enable accurate jitter-free synchronization of the slave devices connected to it.

Diagram 5 – Operation of boundary-clocks to move protocol load from the Grand Master. In this case, ordinary clock ‘A’ is the Grand Master and ordinary clock ‘B’ is in slave mode. If ‘A’ was to degrade or be disconnected from the network, clock ‘B’ would become the Grand Master, as either it would no longer see “announce” messages from ‘A’, or ‘A’ would announce its clock had been degraded. All boundary-clocks would see these “announce” messages and switch over to clock ‘B’.

Switch Must be PTP Aware

Where boundary clocks are either not required or not accessible, then PTP provides transparent clocks. These are used in PTP-enabled switches to take into consideration the packet delay incurred within the switch to reduce the risk of excessive jitter.

Although transparent clocks are essential to keep jitter low, they do not provide timing information other than to update the delay fields in the sync messages. Therefore, they do not take any load off the master clock.

Achieving accurate PTP timing is critical for making distributed television work over asynchronous IP systems. Jitter is our enemy and we must be constantly vigilant when designing networks. PTP frees us from the rigid constraints of SDI, AES, and MADI, empowering broadcasters to deliver new and more efficient workflows.

Other related articles posted on The Broadcast Bridge.

Part of a series supported by

You might also like...

Standards: Video - High Efficiency Video Coding (HEVC)

Designed to halve the bitrate of AVC while supporting resolutions up to 16K, HEVC represents a significant leap in video coding efficiency. This guide explores its profiles, tiers and levels, and examines whether it can overcome the challenges of entrenched…

SMPTE Education Launches Summer 2026 Lineup Of IP And ST 2110 Courses

Boasting two standalone courses, an intensive boot camp, and a hands-on practical lab, SMPTE Education has launched its summer 2026 Lineup of IP and ST 2110 Courses.

Standards: Video - Advanced Video Coding (AVC)

AVC remains one of the most widely deployed video codecs in the world, but navigating its profiles, levels and signaling mechanisms is far from straightforward.

Network Traffic Engineering: RIST & SRT - The Success Of ARQ Based Protocols

IP networks are inherently unreliable. We kick off this series on IP Network Traffic Engineering with a look at how RIST and SRT give broadcast engineers user-configurable control over the latency-versus-reliability trade-off for real-time media streaming.

Standards: Video - Standards For Video Coding

From 4K to 32K, the demand for ever-larger video formats is pushing codec technology to its limits. This guide surveys the landscape of video coding standards – from legacy MPEG formats to AI-driven neural network compression – to help navigate the choices sha…