The Metaverse may seem a long way off but the technology underpinning its rapid deployment is here today and has the potential to empower broadcasters to improve the immersive viewing experience.
Other articles from this series.
Telestration And AI
Television is about telling stories and sports events are full of them. The slo-mo replay has provided viewers with analysis since engineers realized they could provide still frames using VT machines. Each event within a game can be replayed frame by frame to allow the commentators to analyze the players and provide a deeper insight into the game. And this immersive insight is further enhanced when Telestration is added to the mix.
Telestrators first appeared in the 1950s when physicist Leonard Reiffel used one to draw on a series of science shows for WTTW. Using analog storage CRTs and X-Y grid arrays, Reiffel was able to draw as he spoke to explain and educate. It didn’t take long for the sports commentators to see the benefits of this technology and started to use it for major sports events. As technology improved, commentators were able to draw directly onto touch sensitive screens and even roll the video sequences backwards and forwards.
Pose estimation techniques used in ML are being used to predict the trajectory of the ball and the direction the players are moving in. This is providing commentators with an added dimension for improving storytelling and providing viewers with a deeper insight to how the game is progressing.
Telestrators are a rich source of information as the data they provide can be fed back into the real-time AI engines to enhance the richness of the data available, thus providing even deeper insights into the game. Highly enthusiastic sports viewers are constantly looking for new information about their favorite teams and players, and AI can analyze and find patterns much faster than humans. For example, “player A” may have run along the left wing five times during the three goals scored, although they may not be seen on the main program output, the AI engine would have recognized this as a pattern and would report it to the commentators and viewers.
The players themselves are also generating data and tracking technology is used to record the movements. This is much more powerful than a video recording as the players movements are decomposed into motion vectors that can be easier to record and process. This opens the door to more ML processing as the data is fed into the ML engine and used to generate statistics for the commentators and viewers at home. All these concepts are already possible and being improved upon by the Metaverse technology.
Adding to this additional layer of information is speech recognition AI where a viewer may ask a question, such as “who is player 17 and what is their stats?”. The speech recognition AI will respond with the necessary information either on screen or as part of a voice synthesis response. It’s even possible to provide avitars to provide the requested stats.
Broadcast And Avatars
Although mind-controlled avatars of the Metaverse are still in the realm of science fiction, AI driven facial and pose avatars are very much a reality. The first question must be, why would a broadcaster want to use avatars? Apart from providing new and interesting light entertainment shows where performers can adopt alternate egos with facial recognition cameras and motion detecting body suits, there are some interesting applications in news and current affairs.
24/7 news is here to stay but one of the challenges that this presents is that a team of presenters often must be in the studio all day and all night. While this is possible, it’s also very inefficient. Instead, think of a Metaverse type digital representation of a presenter that looks, behaves, and expresses themselves like the original, but they’re not the real presenter, just their avatar.
The Metaverse technology to achieve this exists now as facial recognition and motion positioning can be learned by AI engines. A script can be written that is input into the AI engine so it can then create a real-life avatar simulating the presenter, and the current Metaverse technology is so good that it’s becoming increasingly difficult to tell the difference.
For the avatar to be truly convincing it must move smoothly and the facial imaging must be in sympathy with the context and substance of the script. AI can already determine the attention of written words and it’s only a short jump to add this to the avatar generator to provide emotional content to the face and body motion. Admittedly in news, the upper body is only usually seen but most of the facial gestures are replicated so that the presenter looks real. The intonation of their sentence constructions can be simulated to provide a further level of immersion.
This opens the possibility of using avatars for signing for the hearing impaired. Many governments throughout the world now mandate signing for a percentage of the broadcaster’s output, and this percentage is only going to increase. The synthesized facial and hand gestures of the avatar can be created to show very convincing signing avatars in real time.
Universal Scene Description
USD (Universal Scene Description) is a 3D framework for describing, composing, simulating, and collaborating within 3D worlds as it unifies workflows and file formats to provide a programming language that is at the heart of the metaverse.
There are three main components of USD, the user API, rendering engine, and the scene specification. The designer creates a scene using the USD format through the API which is rendered, often using ray tracing technology, to provide the final sequenced images.
The scene can be compared to a theatre which includes the stage, props and lights where viewers observe the performance through the viewport of the stage. The scene is a database consisting of defined objects that can be layered to provide a hierarchical tree structure.
The USD creates a hierarchical structure of files that specify the scene so designers who are collaborating effectively share these files. For example, if one designer is creating the higher-level image of a street, another may be creating the representation of a car. When the car is finished, the USD description file is added to the hierarchical structure thus including it in the final render.
It’s important to note that the USD files are not image files but text-style representation of the objects and layers that make up the scene. This allows the ray tracing rendering engine to adopt different viewports based on the designers’ parameters.
The designer can also call upon vast libraries containing billions of USD represented images, and add their own attributes to provide an impressive array of potential image renders. This makes distribution of the images incredibly efficient as it’s mainly the USD file that is being distributed and the localized rendering engine turns these relatively small files into the final images.
Figure 2 – USD provides a method of collaboration through open standards to create scene data for virtualized images. USD is used extensively in Metaverse design and is highly portable to broadcast television applications.
An interesting application for broadcasters using this technology is in LED wall virtual productions. Not only can the scene be created by designers collaborating from all over the world, but the scene has depth associated with it which can be rendered in real-time. This will allow the cameras to track into the scene to give a convincing depth of field.
The Metaverse requires a huge amount of storage for both creating the 3D virtualized environment and all the user data created by the user and this is delivering a major benefit for broadcasters. Data falls into a two-stage process, there is the creation of the virtualized environment and then the acquisition of the user data such as head and eye movements and hand gestures.
Creation of the virtualized environment requires extensive storage. As an example, one of the very earliest virtualized environments was Microsoft Flight Simulator that fitted onto a single floppy disk, the images of the instrumentation and views through the aircraft window were very primitive and blocky. Fast forward twenty odd years and the same product provides a highly immersive experience with Microsoft modelling 1.5 billion houses and over 2 trillion trees, this highly complex virtual world consumes around 2.5 petabytes of storage. Flight Simulator is highly constrained which only demonstrates the massive amount of storage that is being designed for the Metaverse world.
This expansion is not only seeing a massive amount of R&D investment in storage technology but is also seeing a significant increase in the processing power needed to make the virtualized world. We must remember that the user will also be generating massive amounts of data that has to be processed in near real-time as well as stored for fast access.
The GPUs needed to process data for the Metaverse creators are orders of magnitude more powerful than standard graphics computers. The servers hosting the GPUs are intrinsically aligned to them to not only provide visual rendering but also provide the AI processing capabilities that ML demands, and this is a core requirement of any Metaverse technology.
GPUs are not only used for their rendering capabilities but form an intrinsic part of any ML system. ML lays heavily on a branch of mathematics called linear algebra and GPUs with their ray tracing, shading, and parallel hardware accelerators use these functions extensively. GPU vendors have even started to include tensor manipulation hardware accelerators to further speed up ML training and inference. All this leads to hugely complex workstations and servers that need to be tuned to deliver real-time processing that the Metaverse technology is delivering.
Combined with the storage needs, the Metaverse technology is taking high-end resources to another level, which is something broadcasters are benefiting from both now and in the future.
Broadcasters work in 24/7 environments where the loss of program output is not something that can be tolerated. With so much high-value content and paying Ads being broadcast every day, the infrastructure must be resilient and reliable.
System reliability is another area where broadcasters can take advantage of the continuing development of Metaverse technology. To achieve the immersive effect, systems must be reliable both in terms of up time, and processing speeds. Although some latency is inevitable, research has demonstrated that humans can easily adapt to small amounts of latency if it is predictable and determinate. And Metaverse development is focused on delivering this as part of the immersive experience.
The Metaverse, as an absolute concept may still be work in progress, but we are starting to see layered 3D immersive experiences provided on top of a 2D internet. By design, the technology is leading the delivery of virtualized worlds and avatars as the infrastructure must be there to allow the creatives to generate the content. But broadcasters can benefit from the technology today and start trialing some of the enhanced immersive experiences that the Metaverse is delivering now and will continue to deliver and improve upon in the future.
You might also like...
Having considered all of the vital elements of moving image coding this final part looks at how these elements were combined throughout coding history.
The Edge network scales with the audience. The more people that stream concurrently, or the higher the average bitrate requested by a consistently sized audience, the more capacity the Edge network needs. Achieving best possible efficiency at the Edge requires…
The criticality of service assurance in OTT services is evolving quickly as audiences grow and large broadcasters double-down on their streaming strategies.
Having looked at the traditional approach to moving pictures and found that the portrayal of motion was irremediably poor, thoughts turn to how moving pictures might be portrayed properly.
At its core, the network-side can be an early warning system for QoS, which in turn correlates to actual QoE performance.