Broadcast Standards: Cloud Compute Workflow Pipelines

This is a detailed exploration of system & workflow principles, storage systems, queue management, how microservices enable active workflow designs, and using node graph systems to create a friendly UI.

An IP based architecture facilitates the construction of workflow pipelines.  The visual effects industry is built on this approach, albeit using somewhat monolithic rendering applications. The problems facing broadcasters as they process increasing amounts of content are structurally very similar.

Stand-alone machines with watch-folders or microservice strategies divide the task into separate modular parts. The workflow will often be a linear pipeline with ingest and contribution as the input and a broadcast head-end or streaming service as the eventual output. Multiple pipelines can be used for different purposes.

Let’s examine some practical aspects that affect workflow design. Choices based on these ideas can significantly affect the performance later on. Capacity planning at the outset is very important.

Problems are challenging to resolve once the workflow is deployed. Maintenance is an expensive resource but can be significantly reduced with careful design.

Architectural Design & Implementation

Workflows can be implemented in a variety of different ways:

  • Classic passive driven workflow model.
  • Active microservice driven model.
  • Software defined workflows.

Regardless of the chosen methodology, there are important issues to consider when designing your workflow infrastructure:

  • Maintain consistent environments on all hosted nodes.
  • Use DevOps to maintain centralized configurations across multiple different platforms.
  • Choosing appropriate production capable codecs for audiovisual asset processing.
  • Capacity planning for extreme scenarios.
  • Choose storage mechanisms that are fast enough.
  • Apply structured network cabling regimes.
  • Concurrency and synchronous behavior.
  • Good queue management discipline.

Other important topics relating to workflow design that we might address at a later date include these issues:

  • Storage file system types.
  • Cross mounted shared folders.
  • Linking (aliases and shortcuts) to resources.
  • Access permissions.
  • The UNIX Filesystem Hierarchy Standard.
  • POSIX standards.

Choosing Suitable Codecs

Given the massive scale of storage systems and fast connectivity, large file sizes are less problematic. Using uncompressed formats for audio and video production improves overall performance once compression is eliminated.

The processing workflow involves careful choice of codecs for video and audio. This may dictate the choice of hardware. That then will affect the scalability of the solution. Moving files around is platform agnostic.

Working With Video Edit Applications

When content is ingested, perhaps from a camera, the video file format and coding format are not necessarily optimal for video editing. Converting the file to a format compatible with an NLE right away is useful. The repeated conversion every time the file is opened is eliminated. ProRES is a good choice on macOS systems (such as Final Cut Pro). Avid DNxHR is good on Windows systems and most other platforms.

Converting to ProRES for delivery can be problematic if you edit primarily on Windows. Reading or converting proprietary files to other formats on non-native platforms is usually feasible but creating them as output files is not. For example, Apple ProRES files created on non-macOS platforms most likely involves a reverse engineered encoder that is unlicensed and non-conformant.

For example, whilst ffmpeg is a fantastic tool, the ProRES files it creates are not exactly the same as those exported from Final Cut Pro. Any divergence from the standardized specification has the potential to cause downstream problems. Some expensive third-party cross-platform software does have properly licensed ProRES exporters available.  If you need to make ProRES formatted files, then a macOS system is the best platform.

A good portable solution for production video coding is Avid DNxHR. It was previously known as DNxHD. It works at all bit depths and Chroma variants and is also 4K compatible. It is supported widely by many software tools and platforms.

Working With Audio Edit Applications

Lossless codecs are preferred when working with audio. There are plenty of open-source free-to-use codecs such as AIFF, WAV and FLAC. They are well supported across all platforms. Proprietary formats such as Apple Lossless are also good because they create smaller files but they need to be used on a macOS platform. AAC and MP3 are lossy delivery codecs and should not be used for ingest or production. You might not hear the difference at first but your tools will introduce unpleasant artefacts as the content is processed.

Capacity Planning

Running out of storage space, network bandwidth or computing capacity are a serious problem if you have under-resourced your infrastructure.

Perhaps this is becoming less of a problem with cloud-compute based solutions but bursting beyond your contracted and budgeted limits may be like having a bank overdraft. It’s very expensive.

Implement monitoring processes that tell you how quickly your resources are being consumed. Develop algorithms to express it in terms of average minutes of video vs. bytes of storage on a disk. This requires some statistical analysis of your system.

The relationship between content size and duration will vary depending on your content. A collection of drama films will yield a different ratio to a collection of soccer matches for example:

These factors will affect your capacity planning:

  • Storage size differences for different codecs.
  • Predicting the rate of resource consumption.
  • Predicting latency and performance leading to estimated durations for workflow tasks. This is affected by data transfers between nodes and how long it takes to retrieve items from storage.
  • Implement extensible storage systems than can grow without service interruptions. RAID storage with hot swappable disk drives is a very useful solution.
  • NAS (shared networked) vs. DAS (personal local).
  • Alternative RAID configurations regarding performance vs. capacity vs. reliability.
  • On premises or off premises installations.
  • Monolithic servers vs. cloud based virtualized CPUs or containers.
  • Cost of implementation.
  • Total cost of ownership.

Total Cost Of Ownership

Calculating the cost of deployment without taking running costs into account causes significant financial problems later on. Break the costings down to take all of these figures into account:

  • Planning and designing the architecture.
  • Scope the design for future expansion.
  • Purchase the hardware or cloud service contract.
  • Understand the fixed running costs.
  • Understand any costs that depend on resource usage (file storage, Network bandwidth, Cloud CPU capacity).
  • Understand the cost of maintenance.

A cheap and rushed solution may in fact turn out to require high maintenance costs after deployment. Perhaps a more expensive and carefully constructed solution at the outset is cheaper in the long run.

Concurrency

Construct parallel paths through the workflow. These might be performing quite different operations on the same content at the same time.

  • Video format conversion.
  • Audio extraction.
  • Speech-recognition for subtitle transcripts.
  • Packaging for distribution.
  • Embedding timed text tracks.
  • Rendering burned-in captions.

The processing time for each of these might vary significantly. Inter-dependency on the timely arrival of resources is sometimes described as a ‘Race Hazard’. If the outputs need to be combined, create buffer containers to deal with asynchronous arrivals. Then restore synchronous operations when all the dependent parts arrive. This will introduce some latency into the workflow pipeline.

Monitoring the behavior will reveal any bottlenecks due to more lengthy paths. Additional capacity can be provided to speed up the processing.

Concurrent processing is much easier to manage using microservices because they can scale instantaneously on-demand.

Storage Access Modes & Performance

Processing nodes require significant amounts of storage for media assets. The way this is attached significantly affects performance. There are several alternatives:

  • DAS - Directly attached storage.
  • NAS - Network Attached Storage.
  • SAN - Storage Area Networks.

DAS - Directly Attached Storage

Directly attached storage includes embedded internal hard disk drives, SSD memory and any externally attached storage connected via USB, or Thunderbolt etc.

Directly attaching storage is assumed to be faster than network shared volumes depending on how it is connected. Internal disk drives might use a SATA (Serial AT Attachment bus) interface. This supersedes the earlier IDE interfaces and is often the fastest way to directly attach storage although recent USB versions are faster. Thunderbolt 3 and 4 are faster than SATA. USB 4.0 is expected to match it for speed. Thunderbolt 5 is approaching the speed of a SAN connection.

NAS - Network Attached Storage

Network Attached Storage is shared across the network but is accessible to multiple users.

These protocols are commonly used:

ProtocolDescription
SMBThe Server Message Block protocol is the most popular for file sharing at the moment. It is well supported everywhere. It was originally based on CIFS.
NFSThe Network File System has been around for a very long time. It only uses one network port for connections and is widely supported on all platforms.
AFPThe Apple Filing Protocol will soon be retired. It may be useful for connecting to legacy systems.
CIFSThe Common Internet File System was a precursor to SMB. It is obsolete now that SMB is available.

NAS hardware devices support RAID configured multiple drives for data resilience and protection. The drives can also be vended individually in a JBOD (Just a Bunch Of Disks) array. The NAS controller supports any mix of configurations and drives can be added to the array or hot swopped when they fail. Removing drives from the array is generally not a good idea and rarely supported.

The drives are connected internally via a SATA interface. They operate in parallel which improves performance. The network interface capacity is likely to be the limiting factor. Adding memory to the NAS controller allows caching for better performance.

Lately, NAS servers are being equipped with SSD drives instead of hard disks. The hard disk manufacturers compete with higher capacity drives and faster performance. An 8-Bay NAS filled with 16TB drives can store 100TB when it runs in SHR RAID mode. SHR-2 RAID mode reduces this slightly but provides better protection. There may be limits on the size of the volumes you create depending on the file system.

SAN - Storage Area Networks

Storage Area Networks are more advanced. They use a fiber connected network that is independent of the Ethernet network host computers use for IP connections. A special adapter is plugged into each host computer node to connect with the fiber network.

SAN connections are the fastest method of accessing storage. Optical Fiber Connected Ethernet is about the same speed but the storage traffic is mixed with everything else happening on the network.

The storage contains many more drives than a conventional NAS solution.  They could be connected in RAID arrangements but with additional layers of redundancy and managed by metadata. Multiple copies of the files might also be used to add more redundant protection. This improves robustness but reduces the overall capacity. The SAN controller keeps it organized and maintains the metadata.

Comparing Performance Between Storage Types

This is a list of alternative storage types and describes how fast they can transfer data. Their transfer speed may be compromised by the chosen connection method: 

DeviceTransfer speedUse case
Slow consumer NAS3 MB/secNAS
Fast consumer NAS10 MB/secNAS
USB 2.0 Flash Drive30 MB/secDAS
High end NAS110 MB/secNAS
Consumer hard drive120 MB/secDAS
Enterprise hard drive200 MB/secDAS
Simple RAID config300 MB/secDAS/NAS
USB 3.2 Flash Drive (Write)380 MB/secDAS
USB 3.2 Flash Drive (Read)420 MB/secDAS
Embedded SSD500 MB/secDAS
Advanced RAID config2 GB/secDAS/NAS

Comparing Performance Between Connection Types

This is a comparison of connection types. Their rated transfer speeds are shown in Megabytes and Gigabytes per second. Bear in mind that the connected devices may not deliver their content at the maximum speed of the connection. If they can deliver fast enough to saturate it, the connection may limit the throughput instead:

Connection methodTransfer speedUse case
Ethernet (10M)1.25 MB/secNAS
USB 1.11.5 MB/secDAS
Fast Ethernet (100M)12.5 MB/secNAS
USB 2.060 MB/secDAS
Gigabit Ethernet (1G)125 MB/secNAS
SATA I150 MB/secDAS
SATA II300 MB/secDAS
SATA III600 MB/secDAS
USB 3.0625 MB/secDAS
USB 3.2 Gen 1×1625 MB/secDAS
10 Gigabit Ethernet (10G)1.25 GB/secNAS
USB 3.2 Gen 2×11.25 GB/secDAS
USB 3.2 Gen 1×21.25 GB/secDAS
USB 3.2 Gen 2×22.5 GB/secDAS
Thunderbolt 3 & 45 GB/secDAS
USB 4.05 GB/secDAS
Thunderbolt 515 GB/secNAS
Fiber Channel (max speed)16 GB/secSAN
Future Fiber Channel32 GB/secSAN
Fiber Connected Ethernet50 GB/secNAS
Far future Fiber Channel64 GB/secSAN

Although USB connections are rated to deliver at very high speeds with the latest versions. Most USB attached drives will never reach these speeds due to other factors. The storage media inside a USB Flash Drive cannot run fast enough to saturate the interface.

Carefully match the performance of the devices to the transfer mechanism so neither causes a bottleneck.

Structured Cabling Performance

The quality of Ethernet cables makes a significant difference. These are described as UTP which standards for Unshielded Twisted Pair cables. CAT3 and above can have a total cable length of 100m. This is based on a 90m cable run with an additional 10m allowed for patching to and from a router, hub or switch.

GradeSpeed
CAT1125 KB/sec
CAT2500 KB/sec
CAT31.25 MB/sec
CAT42 MB/sec
CAT512.5 MB/sec
CAT5e125 MB/sec
CAT61.25 GB/sec
CAT6a1.25 GB/sec
CAT71.25 GB/sec

Regardless of the cable category, they all use the same RJ45 connectors.

Bringing all the distant connections to a patch panel is worthwhile because it makes reconfiguration easy.

The standards for Structured Cabling dictate the grade of cable to use and how the RJ45 connectors should be wired. Typically, all 8 conductors (4 twisted pairs) in the cable are used so the wall socket can support networking and telephony applications simultaneously.

Both ends of a cable should be labelled and everything documented in a manifest that enumerates every cable, its color, what category it is, what it carries and where the two ends are physically located.

I may have said this before but documentation is fundamental and must be accurate and kept up to date. We are only stewards of the infrastructure for now and we owe it to our successors to get it right.

Replicate The Same Environment Everywhere

In a large organization that deploys multiple systems, the technology will be continuously evolving. Equipment or services purchased later on might differ significantly from that already deployed. A gradual migration to a microservice architecture hosted on cloud based virtual containers is an example.

When faced with this wide difference in platforms, OS versions and hardware manufacturers, aim to reduce the maintenance burden. Make life easier for developers by creating a consistent platform configuration.

When the compute nodes are Linux based, develop a Common UNIX Environment. Focus on a common configuration for these attributes:

  • File system organization.
  • Maximum storage volume sizes.
  • Mount points for additional storage and volume names.
  • Consistent limits set for user accounts and system processes.
  • Minimum specifications for storage capacity.
  • Consistent support for peripheral devices.
  • User home folder locations.
  • User account naming conventions.
  • User account privileges.
  • Consistent shell command-line environment.
  • Network node names and IP addressing.
  • Services configured to IP socket port numbers.
  • Firewall configuration (perhaps with TCPWrappers and Fail2Ban).
  • Driver names in the /dev directory.
  • Web-hosting configurations.
  • Event-logging file names and formats.

Configuring Multiple Target Platforms

The configurations in a common environment can be described in include files using symbolic names. A symbolic name is a consistent name that can be used in these environments:

  • Command line shell-scripts.
  • Code that is compiled for execution (C-Language).
  • Scripts for back-end web serving (PHP).

These are all very different and use incompatible syntax. A single file will not suffice. Some simple glue is needed to integrate them so the symbolic values are consistent.

Remember that a prime directive is to specify configuration parameters in one single place only. DevOps processes can maintain multiple include files and regenerate them from a single source definition. This avoids defining the symbols multiple times:

Store the source definitions in a flat file or a database. A flat file is very easy to use. Define a simple name-value pair syntax for each symbolic name.

Here is an example stored in a file called symbolic_names.cfg that lives with a collection of similar files in the source code repository. Describe each symbolic value on a separate line. Use a colon character (:) to separate the name and value:

SYMBOLIC_NAME: VALUE

A DevOps script periodically renders everything in this directory. If DevOps can be triggered by file system events, it can run the render process instantaneously when a file is altered.

Symbolic Names In Shell Scripts

Within an existing shell script, invoke an includable script with the source command. The source keyword can be replaced with a single dot (.) instead of spelling out the whole word. We call this
‘dot-running’ a script:

source symbolic_names.sh

. symbolic_names.sh

This behaves just like an include file in C-Language and PHP. The DevOps renderer will have converted the source definition into this valid shell script code:

SYMBOLIC_NAME=”VALUE”

The script will execute in the context of the current shell and variable assignments will persist after completion.

Symbolic Names In Compiled C-Language

C-Language source files include shared code like this:

#include symbolic_names.h

The DevOps renderer creates C-Language definitions in the include file like this:

#define SYMBOLIC_NAME value

Symbolic Names In PHP Scripts

PHP includes shared files in a similar way but of course there are subtle differences:

include ‘symbolic_names.php’;

The PHP code rendered by DevOps looks like this:

define(SYMBOLIC_NAME, value)

Zero-config Implementations

Installing software often requires manual configuration based on where the software is installed with some knowledge of the OS and host details.

It is possible to determine the base path where an application or script has been installed from the inside, at run-time. Build self-configuring tools using that knowledge. Asset folder locations are derived from that base path with relative addressing.

Likewise, determining the operating system and which host is being used can also factor into the construction of symbolic names containing folder paths that locate machine specific resources.

Using symbolic names to implement data-driven redirection throughout avoids hard-wiring anything and solves a lot of potential maintenance problems.

It is entirely feasible to install software by simply dragging and dropping a folder into the right place with no configuration required. This becomes very important if you have many computing nodes that need to run common suites of software that is distributed by a DevOps process.

This is one of the things that Kubernetes, containers and microservices do very well. It can be done just as effectively with traditional software tools and web content by applying some ingenuity.

Classic Passive Driven Workflow Model

The simplest workflow design operates as a pipeline using cascaded watch-folders. Resources are dropped into a folder which is monitored periodically by a queue manager. The queue manager picks up the first item in the list and passes it to a task handler for execution. This is a very simple process to construct. There is a small latency between each check. That latency can be eliminated if file system events can trigger a queue check when something changes.

The task performs the actions it needs to on the resource and drops the result into an output bin. That output bin may itself be another watch-folder for the next process in the workflow.

The watch-folders provide the basis for a simple queue manager. This is an elegant and simple way to implement queues because it leverages the alphabetic sorting of folder contents that the operating system does for free. It is also easy to inspect and manually correct.

This approach is well suited to these tasks:

  • Ingest & metadata asset record creation.
  • Format detection.
  • Processing tree splitting by format detected.
  • Deep metadata extraction (EXIF, OCR Facial Recognition and AI).
  • Time based trigger extraction and chapter logging.
  • Inserting chapter marks.
  • Embedding subtitle timed texts.
  • Time-base correction.
  • Color corrections with lookup tables (LUTs) and transfer functions.
  • Aspect ratio corrections.
  • Cropping and resizing.
  • Asset library loading.

Queue Management

Because a watch-folder is just a simple directory, the alphabetic collating sequence dictates the order of the files. Construct a file name to indicate a priority value as the leading item. An ascending sequential value keeps the chronological nature intact so tasks cannot jump the queue. Adding a task type provides a means to select various processing options.

{priority}-{sequence-number}-{task-type}

Here is an example queue:

A-00185-TASK2
A-00385-TASK1
A-00580-TASK3
A- 48394-TASK3
B- 00004-TASK2
B-00205-TASK1
B-00405-TASK3
B- 00605-TASK2
C- 00085-TASK1
C-00285-TASK3
C-00485-TASK2
C- 00683-TASK1
D- 00104-TASK3
D-00304-TASK2
D-00504-TASK1
D-00704-TASK3

If the resources were just dropped into the watch-folder as a simple file, a file extension would need to be added. This would work but it is somewhat limited since there is no easy way to add metadata and supporting assets.

Use a ‘job-bag’ folder instead to contain the resource and any supporting assets and metadata it will need. The job-bag folder is named with the queue entry details. Within that job-bag, configuration parameters pertaining to the task are listed in a small metadata file. Processing statistics are gathered and maintained in a small log file. Progress can be updated in a status file which is extracted by a queue manager and displayed on a wall mounted screen.

Careful queue management and design can alleviate a lot of performance bottlenecks by adding scale where it is most needed.

If the queues are designed to be general purpose and carry mixed traffic workloads, designing something like a motorway with multiple lanes will allow small and short running tasks to execute quickly and free up resources for the next task. Large and complex tasks can execute in the ‘crawler lane’. 

Processes that consume significant CPU resources and take a long time can be load-balanced across several replicated queues. Each queue could run on a different machine or microservice.

Designing a sensible queue management strategy allows tasks to be processed as quickly as possible. Small tasks need not wait behind large ones, so throughput is significantly improved. This makes efficient use of resources and increases productivity. Without this multiple queue strategy, a simple task that processes caption text could be held up for some time behind a large video conversion task on a lengthy movie.

Active Microservice Driven Model

With the introduction of microservice architectures, the workflow can be orchestrated via Kubernetes and each process step implemented in a microservice container. Microservices pass messages and task requests to the next process in the chain. Shared storage allows microservices to operate on the same content being passed from one to the next.

This is devoid of any traditional queue structures but some of the ideas from a passive workflow might still be relevant. The job-bags are still a good way to wrangle all the assets that the task requires.

Logging and analytics can be implemented via message passing to a centralized logging engine which is also implemented as a microservice. Another microservice could manage the wall mounted progress display by watching the status of all the in-flight microservices.

Building this is more complex, but only because there might be many more moving parts. Here is the passive workflow reimagined as a microservice based design:

  • The main workflow pipeline is illustrated by the black lines.
  • The green items show how shared storage is accessed as the assets are processed by each microservice.
  • The blue items show how analytics and status information is gathered in the shared storage and displayed on a wall mounted screen.

This design uses messaging and shared storage instead of watch-folders. The job-bags become persistent containers so the shared storage also becomes the content store. We need the shared storage to allow multiple concurrent access to assets from several microservices at the same time. This is controlled by carefully managing the read-only disposition of an asset once it exists. Temporary working assets can be cleaned up at the end of the pipeline processing by a garbage collector.

Software Defined Workflows

Some companies offer workflow solutions that use a zero-code approach based around user interfaces with wired diagrams that connect the flow from one process to another. This is described as a Node Graph System.

Whilst this looks like a relatively new approach, the design concept originated in the 1960’s. The underpinnings are likely to be a microservice but could be a classic workflow infrastructure. The visual representation in the user interface sits on top and hides all the complexity.

Example applications already using a node graph user interface are:

  • DaVinci Resolve.
  • Shake Compositing application.
  • Quartz Composer for processing ad rendering graphical data.
  • Blender open-source rendering application.
  • Telestream video compression tools.
  • Machine learning tools.
  • VFX tools made by Foundry.
  • Autodesk Maya modelling and animation tool.
  • Grasshopper mathematical modelling tools.
  • Dynamo visual programming tool.

Dragging elements that represent processes onto a canvas and connecting them up to create a pipeline has been in use in many systems for some time. Here is a typical node graph connection diagram:

Each node is represented by a container. The connection points on the left are inlets and the ones on the right are outlets. Nodes are connected together by wiring sources to inlets and outlets to other inlets. Outlets can be wired to multiple inlets.

Very complex designs are simplified by wrapping several components in a box. The boxes might be worked on by different teams and represent sub-systems in the architecture. This also facilitates reuse of large portions of the node graph.

If a node graph system can be coupled with microservices, an inspector would open a container to configure it or add custom code and behaviors. It provides a very powerful workflow management tool. Perhaps this is something that could be added as a UI view to a Kubernetes microservice orchestration system.

Conclusion

The elegance and simplicity of using a node graph system that manages a collection of microservices is very attractive from an operational perspective.

An open-source node graph tool would be a useful starting point for building a bespoke system. It is certain that fully capable open-source solutions will emerge in due course. At the moment, they are good for modelling databases and visualizing large data sets. They currently lack the active connections needed to configure a micro-service container.

Building and deploying such a system would be a technical challenge and for now it may be better to buy in a solution if it is versatile enough for your needs.

The high-level dashboard can potentially leverage any kind of technology and hide the complexity from the operators. Software products will certainly facilitate the deployment. Some glue will always need to be applied to integrate all the moving parts.

Supported by

You might also like...

Building Software Defined Infrastructure: Systems & Data Flows

For broadcasters seeking to build robust workflows from software defined infrastructure, key considerations arise around data flows and the pro’s and cons of open and closed systems.

Broadcast Standards: Microservices Functionality, Routing, API’s & Analytics

Here we delve into the inner workings of microservices and how to deploy & manage them. We look at their pros and cons, the role of DevOps, Event Bus architecture, the role of API’s and the elevated need for l…

Live Sports Production: Part 3 – Evolving OB Infrastructure

Welcome to Part 3 of ‘Live Sports Production’ - This multi-part series uses a round table style format to explore the technology of live sports production with some of the industry’s leading broadcast engineers. It is a fascinating insight into w…

Monitoring & Compliance In Broadcast: Part 3 - Production Systems

‘Monitoring & Compliance In Broadcast’ explores how exemplary content production and delivery standards are maintained and legal obligations are met. The series includes four Themed Content Collections, each of which tackles a different area of the media supply chain. Part 3 con…

IP Monitoring & Diagnostics With Command Line Tools: Part 8 - Caching The Results

Storing monitoring outcomes in temporary cache containers separates the observation and diagnostic processes so they can run independently of the centralised marshalling and reporting process.