IP Explored - ST2110 and ST2022

As broadcasters accelerate IP migration we must move from a position of theory to that of practical application. Whether we’re building a greenfield site or transitioning through a hybrid solution, simply changing SDI components with analogous IP replacements will not achieve full COT’s goals and the benefits associated with it.

Migrating to IP provides broadcasters with infrastructure flexibility and scalability. Traditional SDI solutions have stood the test of time, but they are rigid. Moving from SD to HD required large parts of the SDI infrastructure to be replaced as 270Mbit/s SD systems are not compatible with 1.485Gbit/s systems. Early adopters of HD could not move to progressive HD without changes to infrastructure. And the emerging UHD, 4K, and 8K formats are difficult to build with SDI technology.

IP is format agnostic and allows us to mix multiple television standards on one network. If enough capacity is available, then SD, HD, 25fps and 29.97fps formats can be simultaneously transported through the same network. It’s even possible to mix 4K and 8K technologies into the same networks.

ST2022 Released

SMPTE are great proponents of IP and have been working hard to deliver IP standardization, and in 2007 they released the ST2022 group of specifications. As this was the first step for many broadcasters into IP, SMPTE tried to keep the infrastructure requirements as simple as possible. Few broadcasters have the privilege of green-field site installations and SMPTE acknowledged many would be migrating slowly and carefully.

Packetize SDI

ST2022-6 specifies the encapsulation of SDI into IP packets using a hierarchy of internet standards. SDI streams are divided into packets of 1376 octets. Each of these packets is wrapped by an RTP (Real Time Protocol) packet, then into UDP (User Datagram Packet), and then into the payload of an IP packet. A marker bit is used to signify the last IP packet in a video frame to assist downstream decoding. 

Diagram 1 – For ST2022-6, the SDI stream is split into 1376 octet packets, then encapsulated by an RDP packet, then UDP packet, and finally an IP packet. This process continues for the duration of the SDI stream at 1376 octet intervals.

Diagram 1 – For ST2022-6, the SDI stream is split into 1376 octet packets, then encapsulated by an RDP packet, then UDP packet, and finally an IP packet. This process continues for the duration of the SDI stream at 1376 octet intervals.

Although ST2022-6 works well, it is primitive and wasteful of bandwidth. All the SDI signal, including the TRS (Timing Reference Signals) is wrapped into the IP packet hierarchy. Just over 76% of the original SD-SDI signal is active video, the rest is auxiliary data and TRS.

ST2110 Improvements

ST2110 addressed these inefficiencies by removing the timing relationship between the video and audio essence, and the underlying hardware transport stream. SDI, MADI, and AES all rely on clocking information encoded within the data signal, using methods such as bi-phase modulation, to keep video and audio synchronous between devices.

Ethernet is the preferred medium for network distribution in broadcasting as it is the IT industry de-facto standard for office and business systems. It crosses layers 1 and 2 in the ISO Seven Layer model, but in networking documentation is often referred to as layer-2, and distributes data using frames.

Remove TRS and Save Bandwidth

As Ethernet is asynchronous, there is no longer any underlying common clock to reference audio and video samples to. Although ST2022-6 didn’t have an encoded clock, the system worked because the SDI packets were encapsulated in RTP packets, and these were sufficiently accurate to rebuild the SDI stream as it was a well-defined bit rate and the TRS was available.

By removing the TRS, ST2110 instantly reduces bit rates by a factor of 16% to 40% depending on the broadcast format used. But timing is the most important aspect of broadcast television and we still need to synchronize lines, fields, frames, audio samples, and metadata.

Line, field, and frame sync pulses are a relic of the past used to maintain backwards compatibility with cathode-ray-tube cameras and television sets. Although set top boxes still insert these pulses into a video stream to maintain backwards compatibility, we no longer need them in studios or the transmission system.

Diagram 2 – For ST2110, video, audio, and meta-data is stamped with a unique PTP timestamp referenced to the Epoch allowing essence streams to be processed independently of each other regardless of where they were handled.

Diagram 2 – For ST2110, video, audio, and meta-data is stamped with a unique PTP timestamp referenced to the Epoch allowing essence streams to be processed independently of each other regardless of where they were handled.

To achieve synchronization, ST2110 uses the IEEE 1588:2008 protocol. Commonly known as PTP (Precision Timing Protocol), it is used as the basis of timing for ST2110. PTP is a counter that represents the number of nano-seconds that have elapsed since the Epoch time that occurred at midnight on January 1st, 1970. Each ST2110 packet, whether it’s video, audio, or metadata, is stamped with a PTP value in the RTP header encapsulated by the UDP and then IP datagram.

In the case of a camera, each packet of video is stamped with the PTP value determined at the beginning of that frame of video. Downstream equipment receiving the stream will be able to accurately rebuild the video frame and display it with sub micro-second accuracy.

Timestamp Samples

The same is true of audio. All audio packets are combined using a similar format to AES67 into samples and groups. Each group header is stamped with the PTP value at the time it was created. And meta-data uses the same method.

For ST2110 to effectively use PTP, all cameras, vision switchers, monitors, microphones, sound consoles, etc. must be referenced to the same PTP master time reference.

As each packet of video, audio, and meta-data is independently referenced to a global PTP, they can all be processed independently of each other and then combined with sub micro-second accuracy during transmission.

PTP Grandmaster

A network supporting PTP can use many distributed master clocks, but only one is nominated to be the Grandmaster. The BMC (Best Master Clock) algorithm runs on each master clock within a network to determine which clock should be nominated as the Grandmaster. Criteria such as GPS lock and the priority value set by the network administrator determine the most accurate clock.

BMC builds timing redundancy into the network so if the Grandmaster was to fail, or lose GPS lock, then the next most accurate clock would take over and be nominated as Grandmaster.

Distributed clocks throughout the network act as intermediaries to keep the network load low on the Grandmaster, thus helping to maintain accurate timing.

ST2110 delivers boundless opportunities. Video can be processed in the studio at the same time HDR (High Dynamic Range) meta-data is processed in the public cloud, or on-prem datacenter. A whole plethora of SaaS (Software as a Solution) applications will become available facilitating the prevalence of pay-as-you-go pricing models.

Processor Demands

Servers offer the possibility of purely software monitoring solutions, however, there are many applications hosted on computers, all fighting for CPU resource. Operating systems, input-output routines, and storage devices all conspire to make the response times of software unpredictable.

Dedicated hardware solutions can be thought of as massive parallel processing systems. Each function, whether it’s the screen drivers, keyboard input, or Ethernet controller, has its own hardware resource dedicated to that operation with no other tasks slowing it down.

Diagram 3 – Hardware devices provide dedicated user interface controls to give tactile feedback when operating the equipment, especially in highly pressured live broadcasts.

Diagram 3 – Hardware devices provide dedicated user interface controls to give tactile feedback when operating the equipment, especially in highly pressured live broadcasts.

ST2110 constantly streams media without flow control from the receiver. Therefore, the monitoring systems must be capable of receiving Ethernet frames without dropping a single bit of data. Hardware solutions are highly predictable and ideally suited to monitoring. However, as virtualized solutions progress and deliver better CPU resource, they will eventually start to fill the place of hardware only systems.

SaaS Delivers Real Pay-As-You-Go

A production may only need a standards-converter for two days after the final edit to process all the country versions. In SaaS it’s easy to purchase this for the time required. And if you have your own cloud infrastructure you can spin up services as and when you need them.

Although many services are moving into software, hardware monitoring is still highly relevant in broadcast infrastructures, but this will change as Telco IP circuits improve in speed and reduce in cost, and virtualization delivers higher performance.

Tactile Controls Win

Hardware user interfaces are important in operational environments and the ability to be able to press dedicated buttons or rotate controls to change parameters cannot be over stated. Touch screens have their place, but when trying to diagnose a fault or understand a problem, being able to operate a tactile control improves our perception of reliability.

Migrating to IP delivers outstanding benefits for broadcasters wanting to future-proof their infrastructures, whilst being able to take advantage of the ability to scale systems and quickly meet the needs of operational demands. ST2022-6 has paved the way for many to realize the potential for IP. And with the introduction of ST2110, the opportunities to improve workflows, create greater efficiencies, and build flexible, scalable infrastructures will continue for many years to come.

Part of a series supported by

You might also like...