Delivering Timing For Live Cloud Productions - Part 1

Video and audio signals represent synchronous sampled systems that demand high timing accuracy from their distribution and processing infrastructures. Although this has caused many challenges for broadcasters working in traditional hardware systems, the challenges are magnified exponentially when we process video, audio and metadata in software.

This article was first published as part of Essential Guide: Delivering Timing For Live Cloud Productions - download the complete Essential Guide HERE.

Viewer Expectations

Every so often in the development of an industry we have the opportunity to look at the technology and revisit how we deliver our products. In terms of television, our products are highly immersive programs that entertain, educate, and inform our audiences. Therefore, the technology should serve to deliver the product, not the other way around.

From the viewers perspective, they don’t really care how a television program is constructed or delivered to them. We don’t really care how a letter is mailed across our respective countries, and the same is true for program delivery. It’s fair to say that viewers have expectations, which manifest themselves in terms of constraints of the system, but the actual technical detail of how a letter posted in London reaches a home in Sydney is largely irrelevant.

Our ever-demanding expectations of the postal service very quickly influences the type of technology that is employed. For example, many moons ago it would have been acceptable to send a letter by ship which would take six weeks to traverse the world, now we expect our letter to travel by air mail and be delivered in a few days. And then when we look exactly at what a letter is, we realize it’s just a form of communication, in other words it’s information which can be represented as an email, which only takes a few seconds to arrive.

Although delivery times for television have always been in the order of seconds, the analogy to mail delivery is similar, but instead of the time-to-delivery changing on the part of the mail user, now we have viewers who expect to watch what they want, where they want, and how they want, with a constant pressure on reducing costs to them. It’s this expectation that has now placed a constraint on the broadcast system, which in turn has demanded we provide new technology. Viewers are making these demands and we need to find a new way of delivering for them. And the great news is that we have a solution, it’s called IP with its flexibility, scalability, and resilience. However, the devil is always in the detail.

Nanosecond Timing

Before understanding why, we need to completely rethink timing, but in doing so it’s worth reviewing where we are and how we got here.

There are two points to remember; there are no moving pictures in television, just a series of still images played very quickly to give the illusion of motion, and we are still using the same timing constraints that were designed in the 1930s and 1960s to overcome the limitations of the technology of the time.

Both electronic cameras and televisions of the 1930s used vacuum tube technology and they were truly scary as EHTs (Extra High Tension) of more than 14kV and high current circuits were the norm. Mainly, this was due to the need to direct and project electron beams. Sync pulses were needed to shift the beam left, right, up and down the screen (or camera sensor) resulting in massive currents energizing scanning coils. Another name for these is an inductor, leading to very long sync pulses to not only move the beam back to the start of the line, but also keep it sufficiently temporally long so that the change of current didn’t destroy the driver circuits when generating the back EMF (Electromotive Force).

Figure 1 – Horizontal line pulses were devised in the 1930s to synchronize electron beam scanning cameras and monitors, and color subcarriers were created in the 1960s to provide backwards compatibility for black-and-white TVs with the introduction of color. Neither have been needed for at least twenty years, and with the adoption of IP, we now have the opportunity for change to allow us to deliver a better immersive viewing experience.

Furthermore, when color started to appear in the mid 1960s the concept of color subcarrier was introduced to maintain backwards compatibility with existing television sets. This cemented the need for nanosecond timing tolerances to allow the QAM (Quadrature Amplitude Modulation) demodulators to decode the color in the televisions along with demodulating the audio. But our reliance on analog television has been reducing as digital television becomes mainstream.

All these systems were needed in the 1930s, right up to about ten to fifteen years ago, when viewer expectations were relatively modest compared to today. But as digital transmissions progressed and flatscreen televisions and mobile devices started to appear, viewer demands increased exponentially to the heights of where we are now, meaning that we must constantly innovate new solutions to deliver for our viewers.

Remembering The Viewer

The HVS (Human Visual System) is a system that is greater than just our light transducers, otherwise known as our eyes. It encapsulates a whole behavioral psychology that has not only influenced television design but driven it. The eyes provide visual data and prompts for the brain, and the psychology of the brain provides our internal representation of the image. For example, we must simulate fluidity of motion otherwise the HVS detects motion anomalies that can be interpreted as a predatory attack due to flicker. Even when the viewer is sat in the comfort of their own home and know they are safe, disturbances in the fluidity of motion can lead to stress, and that’s before we even start talking about the psychological effects of disturbances in sound.

From the perspective of our viewers, we must make sure the images are smooth and flicker free, as this adds to the immersive experience.

All this considered, we don’t need color subcarriers anymore. All we need is a reference to pixel 0 of the image and an idea of the frame rate we’re using. If we treat the image as a matrix consisting of 1920 x 1080 pixels (for HD), then it’s easy to see why we only need to reference the first pixel as every other pixel forms part of that matrix and can be easily determined.

Maintaining Fluidity Of Motion

To keep images fluid for our viewers they must be displayed with a consistent and predictable time-base. If its erratic, speeds up, or slows down, then this will trigger the ancient structures in our brain which form the HVS resulting in stress for the viewer, manifesting as a lack of immersive experience. But it’s important to remember that the image frames do not necessarily have to be transferred with a constant time base, just displayed that way.

This is a very important step in the evolution of broadcast television as we’re now moving away from the timing constraints imposed by the technology of the 1930s and 1960s, and hence the constraints on the viewing experience. We no longer need to worry about scanning coils and back EMF, but we do need to be concerned with maintaining the viewer’s immersive experience. And this gives us the freedom to think about timing differently. Instead of thinking in terms of what the technology can provide for the viewer, we need a mind-shift and ask the question “what does the viewer want and how do we deliver?”

Figure 2 – A) shows how a 74.25Hz HD pixel clock with a 150ppm tolerance increases in frequency. B) The frequency change relative to a 74.25Hz HD pixel reference clock leads to either too many frames being generated, or too few. If the clock runs fast, then video frames will need to be dropped every 2.2 minutes (in the worst case), and if it runs slow then video frames will need to be duplicated every 2.2 minutes (in the worst case). The 4.4 minutes represents only one clock running fast at +150ppm, but the clock it’s running relative to could be running slower at -150ppm, hence the 4.4 minutes is divided by 2 to give the worst case of dropping or duplicating video frames approximately every 2 minutes.

When we say, “constant frame rate”, what do we really mean? In practical terms it is impossible to reach exactly 50Hz or 60Hz, but we can generate frame rates at these frequencies with a certain tolerance, hence the reason we have sync pulse generators and fly-wheel oscillators that lock to the reference signals. At this point, it’s easy to disappear down a rabbit hole and start making our reference generator more and more accurate, thus decreasing the timing variance to achieve nano-second tolerance, so that the oscillator becomes incredibly accurate.

One reason for our strict timing is to synchronously switch between video and audio sources by making them frame synchronous. And again, we should ask what exactly do we mean by “frame synchronous?”. In the NTSC and PAL days we would tweak the SCH-phase to adjust the line timing into the production switcher to make the video sources line and frame accurate, and this was necessary as the alternative was to use frame synchronizers and they were hugely expensive. When digital switchers matured, they had line buffers built into every input so that the timing tolerance only needed to be plus or minus a few lines. Nobody has tweaked an SCH-phase in an SDI broadcast center for about ten years, but we still talk about nano-second timing.

Other related articles posted on The Broadcast Bridge.

Delivering Timing For Live Cloud Productions - Part 2

Supported by

You might also like...

Designing IP Broadcast Systems - The Book

Designing IP Broadcast Systems is another massive body of research driven work - with over 27,000 words in 18 articles, in a free 84 page eBook. It provides extensive insight into the technology and engineering methodology required to create practical IP based broadcast…

Audio For Broadcast: Cloud Based Audio

With several industry leading audio vendors demonstrating milestone product releases based on new technology at the 2024 NAB Show, the evolution of cloud-based audio took a significant step forward. In light of these developments the article below replaces previously published content…

Designing IP Broadcast Systems: Addressing & Packet Delivery

How layer-3 and layer-2 addresses work together to deliver data link layer packets and frames across networks to improve efficiency and reduce congestion.

Virtual Production At America’s Premier Film And TV Production School

The School of Cinematic Arts at the University of Southern California (USC) is renowned for its wide range of courses and degrees focused on TV and movie production and all of the sub-categories that relate to both disciplines. Following real-world…

Designing IP Broadcast Systems: Integrating Cloud Infrastructure

Connecting on-prem broadcast infrastructures to the public cloud leads to a hybrid system which requires reliable secure high value media exchange and delivery.