Production & Post Global Viewpoint – November 2021

Workflow Resilience

IP is delivering untold opportunities for broadcasters, not least the freedom to improve the efficiency of workflows. But during the evaluation of our systems, we should be thinking about resilience as much as efficiency.

Part of the Agile and Lean mindset is the need to make small changes often. And by implication this is leading to a new method of writing software. Instead of developing software using a massive monolithic code base, which is difficult to test and maintain, Agile developers write relatively small functions of code with well-defined interfaces that lend themselves to unit testing.

A similar analogy exists with workflows. It wasn’t so long ago that designing Main-Failover workflows was the basic requirement of all live production systems, especially in playout. However, over a period of years and often decades, antiquated working practices were set in stone and nobody could remember why they existed. In effect, we end up building massively complex workflows whose interactions are difficult to understand and maintain.

As we migrate to IP many broadcasters are using this as an opportunity to remove these workflows. But no matter how much analysis we do, sometimes, it’s just a matter of biting the bullet and hitting the delete key. Far from ideal, but the alternative is another decade with unnecessary complexity and unpredictable points of failure.

By using the Agile mindset not only can we improve efficiency, but we can also introduce the concept of making small changes often, along with unit testing, especially as many workflows are moving to private datacenters and the public cloud.

In my view, unit, or bench testing is a massive win for system developers. If a designer knows a function works in isolation, then assuming the documentation and understanding of its operation are correct, the function should just work within a larger ecosystem. A utopian dream you may say, but what is the alternative?

If we can’t make a function operate effectively on the metaphorical test bench with well-known parameters, then something is wrong and must be fixed. If it works on the bench but not in the larger system then either our understanding of the function is wrong, its implementation is wrong, or the interface understanding is wrong. And any one of these must be fixed before the installation is complete or we run the risk of leaving massive problems for the future.

Using the method of unit testing forces us into a discipline that will only help in the long term. Furthermore, we can easily identify any weaknesses or points of failure. We can easily build resilience because the functionality is well defined and as with Agile development, each function will be self-documenting. In other words, how a system works should be obvious to a qualified outside observer.

From this standpoint, we can build graphs of interconnection that in themselves demonstrate weaknesses and strengths and allow us to provision backup systems without having to enable them until they are needed. With the correct infrastructure mindset, the whole system becomes self-documenting leading to the probability that antiquated working practices become a thing of the past.

Migrating to IP and increasing efficiency through Agile and Lean working practices isn’t just about stripping everything to the bone to save money, it’s an opportunity to build resilience and self-documenting systems that will stand the test of time and develop as the needs of the viewer, and hence broadcaster expand.