Audio Global Viewpoint – October 2020
Adoption of IP is providing broadcasters with real and unprecedented opportunities to reconsider many of their workflows and operating procedures. Due to the unique way in which video and audio essence has been abstracted away from the underlying timing parameters, we now have the freedom to reconsider our attitude towards timing, and specifically latency.
One of my main concerns with migrating to IP is that the acronym SDI is just being replaced with the acronym IP. Although ST2022 was just serializing SDI into IP, syncs and all, it was a good step for many into the world of IP before the big prize of ST2110 came along. This has the potential to allow us to remove the rigid synchronization model we’ve been stuck with over the past eighty years.
To fully utilize the power of ST2110, we must not only think in terms of PTP but more importantly abstract our notions of timing away from the traditional synchronous line and frame sync model. In my mind, merely approaching IP as a direct replacement for SDI is a very limiting thinking condition that will stop us achieving IP’s full potential.
PTP was born out of the need to synchronize assembly robots during the manufacture of vehicle chassis on automated production lines. Stepper motors were employed to rotate the chassis to allow the welding robots the access they required. If either of the stepper motors were not synchronized then the whole chassis could twist and distort, rendering it useless.
As with many aspects of broadcasting we are learning much from industry. Production lines didn’t have sync-pulse-generators with embedded timing in the control signals, instead, they used the method of event processing. Rather than attempting to guarantee that a signal will reach its destination in a prescribed and well-defined time, each robot was loaded with a sequence of instructions that were activated at specific times (based on a common PTP reference). Although this introduced some latency, the overall result was a highly reliable automated system.
If we start thinking about video and audio in this manner, then latency becomes more of a manageable limit than a race to ridiculously low values. The more latency we allow ourselves, then the more reliable are our infrastructures.
But how much latency is acceptable? I believe this is a lot more than we’ve allowed ourselves to think. If we consider a live contribution satellite link with standards conversion between the USA and Europe, then the delay could easily reach a second. And much of the modern digital video processing equipment has inherent latency built into it representing two, three, or more frames.
If we assume that video and audio is synchronised in terms of lip-sync, then I would hypothesize that we could easily afford 250 milliseconds of latency in a broadcast studio, that is, from light entering the camera to it being displayed on the studio-out multiviewer. The key is, the latency has to be consistent and predictable. I accept that if we put two monitors from the camera and studio-out side-by-side then we would see a delay between them, but in the real-world operation of a TV studio where this rarely occurs, would we notice 250 milliseconds of delay? I suspect not.
We might be able to expand this further. I do know that if we start considering video and audio over IP as a true asynchronous system, then we will be able to build much more reliability and flexibility into our IP infrastructures, especially if we allow ourselves some realistic freedom in latency. That is, we should look at a fixed latency of 250 milliseconds, as opposed to something unachievable and unmeasurable such as “low latency”.