Audio Global Viewpoint – December 2021

Reducing Latency With GPUs

Latency continues to be a hot topic as more broadcaster’s transition to IP infrastructures. But the advances in GPU design may be key to the successful deployment of low latency IP systems.

GPUs have an interesting history. Originally designed to accelerate video intensive processing applications for desktop computers, such as CAD and gaming, they’re now finding their feet in IP video processing and machine learning.

Video processing may seem an obvious application for GPUs as sub-units such as “ray tracing” and “shading” tend to give the game away. But how on earth do they find applications in IP processing and machine learning? The simple answer is that at the core of their design is the application of the field of mathematics known as linear algebra.

Machine learning uses linear algebra to compute its forward and backwards propagation to allow ML models to learn and then process data. Not only does the GPU excel in this mathematical discipline, but the thousands of processing instances on the GPU allow it to process potentially thousands of data points simultaneously. Added to this is the on-card local memory resource that makes the GPU an incredible processing engine for video, audio, and ML.

The Nvidia A100 state-of-the-art GPU has nearly 7,000 CUDA processing cores, 40GB of memory, and a memory bandwidth of just over 1,000 GB/s. These numbers make processing of HD progressive video at 3Gb/s well within the grasp of the GPU, and even 4K with 12Gb/s is achievable.

Admittedly this level of GPU card isn’t for the faint hearted and attracts a substantial cost, along with the need to install it in a server that can stream data over the PCI bus fast enough and provide the 300W of power each card needs. However, they are COTS products and made for other industries so don’t suffer the custom design limitations of traditional broadcast hardware.

Again, this is another example of how broadcasters are benefiting from innovation in other industries. The GPU wasn’t designed for broadcasting, but its massive programmable resource is delivering incredible benefits for us, both in ML and video/audio processing.

For me, the real winner is that streamed video using ST2110, NDI, SRT, or any other format is not only processed by the hardware accelerators on the GPU, but that the code executing these processes is running directly on the GPU cores. It’s almost as if the CPUs on the servers have been relegated to just shifting data between I/O ports and providing the OS environment for the GPU card to work in.

With a combination of parallel processing and massive memory bandwidth availability, the GPU can process video in real-time with incredibly low latency, and by some accounts this is as low as two frames of video. If we add to this the possibility of process acceleration through ML and the ability to run the GPU in a public cloud, then the GPU is certainly proving its worth in live broadcasting and in my view is worthy of much more attention in the television community.