Delivering determinant latency is more important than fighting variable latency, even if it is small. In this article, we look at how codec design and JPEG-XS can scale to make the best use of network bandwidth while keeping latency predictable.
Video Codec Resilience
Video codecs are notorious for introducing latency. It seems that the more efficient the codec, then the more latency that is introduced. This is particularly evident when we use long GOP type compression. For program quality contribution feeds we often use I-Frame only type compression which helps keep the latency more predictable. I-Frame only compression keeps motion artifacts to a minimum and maintains editing and mixing quality in the production gallery as each video frame is compressed in isolation to its neighbors.
The new generation of visually lossless, low latency, and lightweight video compression codecs are helping contribute to the delivery of broadcast quality video over managed and unmanaged networks to studio facilities.
JPEG XS is one such codec that looks to improve upon MPEG and JPEG standards by using wavelet and sub band technologies. Not only does this improve compression performance, but also adds scalability and editability to the feature of tools provided. One of the challenges of JPEG XS compared with J2K or H.264/HEVC is that it requires an increased bandwidth which generally requires JPEG XS to be used on managed circuits. However, the advantages of lower latency and a lower complexity codec makes software implementation much easier.
Regions of Interest (ROIs) can be defined and encoded to provide a better quality than the rest of the image. The ROI is first decoded before any of the background so that when poor transmission paths are encountered, the decoder can focus on the important areas and fill in the gaps as the data becomes available. Although not ideal, the algorithm works on the principle that it’s better to provide data that can create the areas of interest, than providing no image at all, or an image with irrelevant data.
Initially, the image is transformed into the RGB color space using color transforms. Then the images are split into sub bands using block filtering type technology. This creates sub images with varying levels of size and detail to help the codec send the appropriate data for the available network bandwidth. The wavelet transform is then applied to the sub band images to provide image-based coefficients that can be quantized for compression to meet the needs of the HVS (Human Visual System).
Discrete Cosine Transforms (DCTs) are used extensively in JPEG and MPEG compression. Although the DCT doesn’t compress the image, it does transform the image from the time domain into the frequency domain. Further processes such as quantization then provides the data reduction to take into consideration the features of the HVS. One of the challenges of DCT is that all the image coefficients must be sent regardless of the available network capacity. This leads to potentially high levels of latency and poor QoE in bandwidth compromised networks.
DWTs are a form of wavelet analysis that are particularly exciting for broadcasters as they apply their transform to each of the sub band images in isolation. This has the advantage that a complete image, albeit with low resolution, can be sent and decoded with the detail being added as it becomes available. This differs from DCT systems often used with MPEG compression where the whole image is sent as N x N blocks, thus requiring the whole images worth of all coefficients to be received before an image is reconstructed.
The power of DWT for two-dimensional image processing can be fully appreciated when the sub images and hence the sub bands are better understood. The algorithm not only provides multiple sub images with varying degrees of detail, but also reduces the horizontal and vertical resolution throughout the process, thus making better use of the available network bandwidth.
Figure 2a – Four bands are created by filtering the image into high and (HPF) low bands (LPF), then reducing the sampling rate (HFSC and VFSC) by two, to provide the HH (High-high), LL (Low-low) HL (High-low) and LH (Low-high) bands
Figure 2b – The original image of N x M pixels is reduced by the subsampling the horizontal and vertical samples to provide the four sub band images from figure 2a. LL is the image that is sent first to provide the base image, then HL, LH and HH are sent assuming the network bandwidth is available.
Figure 2a and 2b show how a level-1 sub sampler decomposes the original image into four sub images, all one half the vertical and horizontal size of the original image. The LL (Low-low) band image is the result of a low pass and sub sample in both the horizontal and vertical domains. If this was the only image that was sent to the decoder (because of insufficient bandwidth), then the decoder would have to up-sample the image by a factor of two so that it matches the original size. A viewable image would be provided by the decoder but it would lack much of the detail. Assuming the three other sub band images could be transmitted, the decoder would double their height and width and then add them to the base LL image, thus providing an image similar to the original.
It’s worth reiterating that compression hasn’t taken place until the DWT takes each of the sub band images, determines the coefficients of each image, and then applies the quantization. It’s the application of the quantization that provides the compression and the resultant coefficients are sent to the decoder. The decoder then reverses this process to create the original image (or a close approximation to it).
Sending each of the sub band images represents part of the true power of JPEG XS and similar wavelet compression systems. The DWT is highly optimized to use the least amount of memory and processing possible, and sending multiple compressed images of the original, each adding a layer of granularity to the image, provides a much lower compression latency. Because of the sub band derivation, wavelet compression systems can adapt to the amount of bandwidth available on the link to optimize the pictures being sent.
Providing flexible contribution circuits over IP managed and unmanaged networks encompasses many different disciplines. From low latency codec choice to secure auto switching managed and unmanaged circuits. With modern developments, this is no longer the onerous task it used to be, and fully automated solutions are available to deliver highly flexible, secure, and low latency contribution circuits.
You might also like...
A discussion of how to create reliable, secure, high-bandwidth connectivity between multiple remote locations, your remote production hub, and distributed production teams.
An examination of how to plan & schedule resources to create resilient temporary multi-site broadcast production systems.
We discuss the roll out of ATSC 3.0 in the USA with Jerald Fritz, Executive Vice President for Strategic and Legal Affairs at ONE Media 3.0 - part of Sinclair Broadcast Group.
In television, ‘talent’ isn’t just the people in front of the camera. Everyone working at a station needs talent, dedication, initiative, and team spirit to succeed.
Phil Rhodes shares his personal perspective on the sometimes-staggering pace of change in new commodity technologies that are disrupting professional media production.