PTP Explained - Part 3 - Operational Supervision Of PTP Network Services
In the previous two parts of this four-part series, we covered the basic principles of PTP and explained how time transfer can be made highly reliable using both the inherent methods IEE1588 provides as well as various complementing redundancy technologies. In this part, we look deeper into monitoring PTP systems.
Regardless of the level of fault tolerance a PTP infrastructure is provisioned with, it is still crucial to observe the behavior of the complete PTP network as closely as possible. This is absolutely mandatory during the deployment and commissioning phase of any new system, yet it should be continued during normal operation to a justifiable degree. But why is monitoring so essential for PTP and what data should be observed?
Basic PTP Monitoring Requirements
One of the undisputable advantages of PTP is its ability to select a common time reference on its own via the Best Master Clock Algorithm (BMCA). It is worthwhile noticing that every BCMA selection round engages all devices i.e. every PTP enabled port of every end device and Boundary Clock alike. Although the conditions for state changes to occur are defined very strictly, leaving no room for ambiguous interpretations by implementors, interoperability and compliance to the IEEE1588 standard under all operating conditions should not be taken for granted. Continuous monitoring of every BMCA state change is even more justified when its dependency on timeouts i.e. the absence of Announce messages over a configurable period of time is factored in.
Besides keeping a close eye on state changes, the accuracy can be affected by a number of other parameters. In contrast to an SDI-based environment, monitoring should not be limited to just checking quality of the time sources, rather it should include as many devices as possible employing both in-band and out-of-band measurement techniques.
It is good design practice for any mission critical installation to deploy more than one accurate time source. However, is should be kept in mind that only one Grandmaster will be active at a time with all others remain in hot stand-by merely listing to PTP traffic. To avoid unwelcomed surprises during a master failure, the quality of all PTP Grandmasters in the network needs to be continuously verified as a first mandatory step.
The synchronization status of all Slaves (or at least Slaves which play an important role within an All-IP Studio such as cameras, switchers and mixers) should be periodically monitored together with the status of all PTP aware network devices. In case PTP is deployed within a network without any PTP support, the loading of the network requires both careful planning and continuous monitoring as it will affect the accuracy. Thus, transient load peaks causing high Packet Delay Variations (PDVs) can be either avoided completely or their effect on the accuracy can be mitigated. As a basic requirement, such peaks need to be logged whilst alerting personnel in charge of the network and broadcast operations.
Modern networks provide a high level of redundancy and are able to cope with partial failures such as a broken network connection. Traffic will be re-routed automatically applying protocols such as Rapid Spanning Tree and/or an Interior Gateway Protocol. Such events do impact PTP because the transmission time for PTP messages will suddenly change. Therefore, such events need to be monitored in the same way as load peaks.
PTP Monitoring Techniques
After having established what information we need to gather from as many devices as possible, we need to plan how to do it. PTP specifies a set of management messages for querying the status of nodes as well as setting specific PTP parameters. They are well suited to query all nodes within a PTP network but should be complemented by additional monitoring measures. Firstly, PTP Management may increase the PTP network load significantly if all nodes within a large network are queried in short intervals. Secondly, PTP Management messages will yield only short-term information, rather than providing data about past events together with respective time information allowing to correlate information gathered from different devices in the network with each other.
Therefore, the PTP Management mechanism has to be complemented by other techniques. Aside from monitoring the presence and contents of the Announce messages, PTP event messages (Sync, Delay_Request, and Delay_Response) need to be accounted for as well. Within PTP, a Master failure is ONLY detected via the absence of Announce messages. If a Slave does not receive PTP Event messages, it will remain in its state without triggering the BMCA, yet its local clock starts drifting away. Such a situation may well occur due to a malfunctioning network device, be it PTP aware or unaware, but will remain undetected by the PTP Slave. Some PTP device manufacturers provide access to extended statistic data such as packet counters via custom PTP management messages or other standard network monitoring protocols such as SNMP (Simple Network Management Protocol). At a network level, PTP traffic can be monitored and analysed easily using open source tools like Wireshark or PTP Track Hound; the latter being specifically dedicated to PTP traffic.
The current offset of a PTP Slave can be accessed simply via a corresponding PTP Management message. However, this data may be insufficient to assess the synchronization quality of the device in question, because it reflects only the Slave’s point of view. Any well designed PTP servo loop will keep the mean value of the offset very close to zero. It will do so by assuming a symmetric transmission delay within the network and has no way of detecting asymmetries and thus cannot account for them.
A simple and straightforward way to verify symmetric delay or account for asymmetries is to measure the offset of the PTP Slave to its Master externally. This can be accomplished by comparing signals generated by both devices with appropriate measurement equipment. This approach is equivalent to comparing video sync signals with a vectorscope. Some devices can measure the offset of an external input against their internal PTP-synchronized clock.
It is well understood that proposing out-of-band measurements as an important monitoring tool will counteract the fundamental principle of PTP to be deployed on the single communication medium together with all other user specific traffic. However, it should be considered at least during initial deployment of PTP as a viable tool for evaluation.
For mission critical applications nodes providing out-of-band offset measurement capabilities should be placed at distinct points within a large network, thus further enhancing the observability.
Extended PTP Monitoring
Several PTP vendors led by Meinberg have proposed an enhancement to the PTP standard which greatly improves its monitoring capabilities: The Netsync Monitor. It can be added either to an existing PTP node (preferably a Grandmaster) or can be deployed onto a separate monitoring device.
Support for this monitoring extension requires only minimal software changes to the PTP stack. The monitoring system initiates and maintains a two-way time transfer similar to the original PTP mechanism and thus can utilize existing hardware for scanning and timestamping of PTP packets without any alteration whatsoever. Regardless of the communication mechanism selected for standard PTP traffic all monitoring messages are exchanged in unicast.
The transfer is started by sending a Delay_Request message. This message is extended with a special TLV (Type Length Value) field designating it as a monitoring message. The receiver will simply process this message by gathering the ingress timestamp and returning this information via a Delay_Response message extended by a corresponding TLV back to the monitoring system. The data contained in the TLV triggers the device to generate one additional Sync message again extended by a TLV. This message is sent to the monitoring system, which now has gathered the same four timestamps as every Slave is using to calculate the offset of its local clock. The monitoring system, however, will not use this data to adjust its clock, as it is already synchronized using an alternate time feed i.e. a GNSS source. It will rather be able to analyse the offset of the device independently. It should be noted that more than one monitoring device may be added to a PTP network. The data gathered by these monitoring systems can be used to evaluate the synchronization accuracy of all devices supporting this extension. Furthermore, it can reveal offsets caused by asymmetries.
Time transfer using PTP is a small, yet crucial part of the All-IP Studio. If designed with special attention to providing a sufficient level of fault tolerance, PTP will maintain accurate time throughout the network, requiring little to no user interaction at all. Continuous monitoring of all vital PTP parameters should, however, not be limited to the deployment and commissioning stages. Both in-band and out-of-band measurement and monitoring techniques should be employed whenever manageable.
You might also like...
Delivering High Availability Cloud - Part 1
Broadcast television is a 24/7 mission critical operation and resilience has always been at the heart of every infrastructure design. However, as broadcasters continue to embrace IP and cloud production systems, we need to take a different look at how we…
Understanding IP Broadcast Production Networks: Part 3 - Resilience
How distance vector routing simplifies networks and improves resilience.
The Technology Of The Internet: Part 2 - Bandwidth And Adaptive Streaming
How CDNs, Adaptive Bit Rate streaming protocols and codecs have evolved together to take video streaming forward from the earlier inconsistent and low-quality experience, to deliver broadcast quality today in many cases.
IP Monitoring & Diagnostics With Command Line Tools: Part 9 - Continuous Monitoring
Scheduling a continuous monitoring process will detect problems at the earliest opportunity. If the diagnostic tools run often enough, they can forecast a server outage before a mission critical failure happens. Pre-emptive diagnosis and automatic corrections are a very good…
System Showcase: Ireland’s RTÉ Adds Video To Its Radio Studios To Increase Content Value
RTE’s move to new studios prompted a project to add more sophisticated video capabilities to its new radio studios, reflecting a global trend towards the consumption of radio online.