MPEG-I is about creating a standards basis for immersive media.
Developments in compression technology have moved on at pace and encompass a new codec for production of UHD - JPEG XS, and a broader scheme designed to create standards basis for immersive media – MPEG-I. Tony Jones, Principle Technologist, Ericsson Media Solutions walks us through the differences and how the two intersect.
What are the essential attributes of JPEG XS and MPEG-I and how do they differ?
Tony Jones, Ericsson: JPEG XS is a compression standard intended to address uses where low complexity and low latency are necessary, but reasonably high bandwidths can be used, for example, UHD at around 2 Gbit/s vs uncompressed at 12 Gbit/s. There are a wide range of potential professional applications, including studio use, remote production and other instances where latency is critical, but where high bandwidth connections are still available.
MPEG-I is about creating a standards basis for immersive media. In turn, it will define which encoding standards are to be used, but isn't the specification of that encoding standard itself. In the future, a new encoding standard may be used, but most likely it will either use or extend a more generically useful new encoding standard. MPEG-I considers the needs of delivering to the end user via lower bandwidth links and that of course means higher compression ratios, the result of which is also more latency. The initial phase of MPEG-I, known as Phase 1a, targets 360 video with 3 degrees of freedom. Subsequent phases are likely to target full VR.
What future media applications will these enable?
JPEG XS is likely to be suited to 4K and 8K, in particular for production and editing (both live and file based). Uncompressed 4K video results in large bandwidths and file sizes, 8K multiplies this by 4, even without a change in frame rates. Light compression, such as JPEG XS, is a realistic technique to keep bandwidths, file sizes and file transfer times under control for high-quality assets, where the quality needs to be virtually indistinguishable from the uncompressed quality. JPEG XS is also useful for keeping the latency well below 1 video frame.
MPEG-I, on the other hand, enables delivery into the home for immersive media, for example 360 video. By defining a standard for its delivery, the industry can adopt with more confidence and with more companies investing in the technology. This can offer a new experience to consumers, with suitable viewing devices. 360 video is very achievable; VR is somewhat more complex. In both cases, there is an extremely stringent motion-to-photon requirement (i.e. the responsiveness of the display to any change in head position must be extremely low latency).
However, for 360 video, the rendering is performed locally from either the entire 360 image or a suitably sized portion of it, whereas for true VR, the scene itself must be created based on those head movements. If the scene creation can be performed locally, such as in a games console, then the requirements are not too challenging. If, on the other hand, the rendering is performed remotely and needs to be delivered without an excessive bit rate demand, then there are significant challenges to achieve that at the same time as meeting the motion-to-photon requirements.
What is the difference between JPEG XS and AV1 and HEVC?
JPEG-XS is an intra-coding technique i.e. no temporal prediction is performed. This results in much lower bit rate efficiency than compression standards such as AVC and HEVC, but in turn offers extremely low latency.
HEVC and AV1 are effectively successors to AVC and VP9 respectively. These standards are built around achieving excellent bit-rate efficiency, but necessarily incur significant latency in order to achieve that, as temporal redundancy is a large component of information that can be removed. HEVC, like AVC, has profiles intended for professional use that means it can also be suitable for the content contribution and production chains, so there is an element of overlap where either or both JPEG XS and HEVC might exist, however they are not direct equivalents.
Is future compatibility between JPEG XS and other codecs like AV1 or HEVC an issue?
Not really. JPEG XS, AV1 and HEVC all require their specific encode and decode implementations, so conversion between them will require a full decode followed by a full encode. Since JPEG XS is specifically designed to operate at a near-lossless level, it has negligible impact on a subsequent compression stage. To date, there appears to be no evaluation on the effect of concatenating HEVC with AV1 or vice versa. As always with highly compressed video, if two stages are operated at similar levels of compression, then it is probable that there will be additional degradation compared to a single stage; the more different the characteristics of the coding standards, the more likely the impact.
What is the essential technology change which drives improvements in compression?
In general, it is the processing power of devices used for decoding that sets the remit of a compression standard, since every decoder must implement all relevant tools. MPEG-2, AVC and HEVC all took advantage of increasing capability in silicon to enable decoders to handle increasingly complex bitstreams with increasing resolution. At each stage, some potential compression tools were considered too complex for real-world implementations and so were omitted in the interests of creating a workable standard.
Successive encoding standards adopted more complex tools as these became more realistic to implement in consumer devices. Beyond that, all encoders are most definitely not equal. The best-performing encoder vendors invest significantly in algorithm research to develop better algorithms (i.e. lower bit rates for the same perceptual quality) within the target technology platform. Typically, international standards that define the toolset and decoder, but not the encoder, allow this kind of investment, which in turn has resulted in significant efficiency improvements within a standard year on year. Ultimately, understanding the quality perceived by a human viewer, rather than a simple metric or even a combination of metrics, is the key measurement.
In addition, TV service providers and content owners need to satisfy the expectations of consumers in a highly competitive market. This, for example, may include HDR, UHD, 360 video and VR, as well as more formats to different types of devices, all of which adds to the capacity or cost pressures for delivery to the consumer as well as offering new monetization opportunities. Compression efficiency is one of the primary tools for providing new or better services, minimizing the distribution costs, or a combination of the two.
You might also like...
Part one of this four-part series introduces immersive audio, the terminology used, the standards adopted, and the key principles that make it work.
Philo T. Farnsworth was the original TV pioneer. When he transmitted the first picture from a camera to a receiver in another room in 1927, he exclaimed to technicians helping him, “There you are – electronic television!” What’s never been quoted but lik…
Live broadcasts are seen as nirvana in terms of attracting an audience. Presenting a live event, especially sports, in real-time and high quality, draws audiences like no other content. Yet, successfully originating these broadcasts is often both expensive and complex. A…
Saving dollars is one of the reasons broadcasters are moving to IP. Network speeds have now reached a level where real-time video and audio distribution is a realistic option. Taking this technology to another level, Rohde and Schwarz demonstrate in…
As the television business has become more global, and evolving consumer devices spawn the need for ever more formats, there has been an explosion of the number of versions that are needed for an item of content. The need to…