Machine Learning (ML) For Broadcasters: Part 8 - AI And ML Drive TV UX Innovation

AI, primarily Machine Learning (ML), is driving progress in the evolution of the UI (User Interface) for TV’s and other consumer devices, alongside other aspects of the overall UX (User Experience).

Other articles in this series:

There is overlap between the UI and the wider user experience as far as ML is concerned, for example in emerging capabilities to match audio output to the user’s acoustic environment through adaptive feedback.

Another common thread is that many applications of AI or ML in the UI, feed off advances already made in other sectors, especially in the use of personal computers, smart phones, as well as enterprise IT. Past experiences in these sectors can help avoid mistakes such as overcomplicating the UI or introducing too much sophistication too soon. UI advances should be led by the longstanding principle that users should be given the shortest route possible to the content they want to watch, which is admittedly easier to state than to achieve.

There are though some unique UI aspects relating to traditional lean back TV viewing, which is still widespread despite the proliferation of access to video and TV services from portable wireless devices. At the same time, the overall viewing experience should be as consistent as possible, which is especially relevant for broadcasters, and providers of pay TV services whether these are legacy, subscription VoD or AvoD (Advertising VoD). In all these cases viewing may occur on devices of widely varying format, lean forward or lean back. On the one hand, users expect the idiosyncrasies of different devices to be catered for, yet they desire some common elements in navigation, search and recommendation. They expect their preferences to be taken into account and yet also have those determined partly by variable factors such as the device type, time of day, and even their location. A growing number of video service providers utilize ML to help personalize their UIs while catering for these variables.

A key aspect of the UI where ML is intimately involved lies in evolution of the traditional remote control and expansion of the TV domain across the home for control of other devices. There is a paradox here in that on the one hand viewing is fragmenting across myriad devices and platforms beyond the living room and yet the primary TV is finding new applications as the hub of smart homes and also for non-TV applications benefiting from the big screen, such as video conferencing.

The move towards smart internet connected TVs has fuelled this trend by making it easier to implement applications such as casting from mobile devices, while enabling some computationally intensive ML-based UI related tasks to be performed remotely in tandem with the TV’s own processing capabilities.

The other big UI trend is use of voice, which also straddles the different device platforms but is featuring increasingly as a bridge between the legacy remote and more advanced UI capabilities with more individual personalization enhanced by ML. The remote had become a drag on UI innovation to some extent by preserving the traditional clunky manipulation of on-screen menus through the D-pad (Directional Pad), the four-way controller with one button on each point that has been the mainstay of such devices.

LG’s Magic Remote features AI technology for speech processing but still has the legacy D-Pad. (Source LG).

The D-pad has been preserved largely out of the conservatism inherent in traditional TV with a reluctance to alienate established users, but with the effect of preventing lean back TV from being as fast and responsive as lean forward streaming services on single devices. The route to content is often much longer on the main TV than it is on the laptop or smart phone, even though the number of actual clicks or manoeuvres may not be so different.

Recently though we have seen smart TV makers introduce voice in parallel with the traditional button control, bringing the TV UI more in line with streaming devices. AI and ML play various roles in voice UIs, from the underlying natural language processing to personalization and authentication of individual speakers. Voice allied to a traditional remote gives the opportunity to enable the one-to-one personalization long exploited for online video by identifying users from their voices.

Until recently, personalization has been confined to the household level with just some individualization through observation of content being viewed, or sometimes as a result of individual logins. Voice adds an additional dimension, making it easier for the system to identify individual users.

Voice UIs are complex and time consuming to develop from scratch, so even the largest video service providers are adopting technology already developed by major players in the field. These include major IT systems and services firms like IBM and also the Big Five tech companies, that is Microsoft, Apple, Google, Meta (formerly Facebook), and Amazon.

A number of set top and broadband gateway software vendors have collaborated with one or other of these major players over the voice UI, in some cases positioning the TV as a home control hub. Then voice becomes both the medium for the TV UI and also for controlling devices around the home, potentially including fridges, toasters, smart speakers and WiFi routers.

AI and ML are deeply involved in this expansion of the role played by voice assistants, helping orchestrate a number of the functions, ranging from parental control over access by children, to automating various applications of the smart home. This can extend beyond voice to facial recognition in security monitoring for example, with scope for contacting users remotely. In this way the TV UI becomes increasingly entwined with other services and applications around the home, bringing revenue generating opportunities for video service providers, especially if they are also in control of the broadband connection.

It is important to recognize that not all consumers are enamoured of voice, or for that matter touch screen control such as Apple TV provides, and that again points back to retention of the traditional remote with its D-pad, at least for now.

When voice is included, ML can help cater for varying levels of engagement by the user, allowing some to progress to “Conversational AI” for more complex interactions, while allowing others to progress more slowly with basic single word commands. Designers of UIs should always obey Hick’s Law, which states that if users are given too many options, they end up taking longer to reach a decision. Related to this is the principle of progressive disclosure whereby users are asked just one question at a time rather than confusing them with several at once. ML can help here by making intelligent deductions and speeding up the process, reducing that “time to content”.

While traditional remotes have defied predictions of their imminent death for years, rather like set top boxes have, they have been under threat from smart phones positioned as universal TV controls empowered by downloading of apps enhanced by ML in various ways. The idea of a universal TV controller was first posited almost as soon as remotes entered the consumer TV realm just over 40 years ago, but for years these failed to gain much traction because they only offered a subset of the full range of UI functions.

That constraint has been removed with the help of ML, which can enable the traditional remote format to be replicated on a smart phone screen while allowing advanced capabilities based on voice or gesture to be incorporated. The use of ML to enable control of basic functions by sweeping gestures picked up by the smartphone camera is under development by a number of app vendors and may soon become part of the TV UI armoury.

So, although AI and ML have been entering the TV UI realm for at least a decade now, it is only recently they have started enabling more advanced capabilities alongside traditional TV remotes. Such devices will increasingly be augmented by AI and ML-related capabilities as they enter their final lap.

Other related articles posted on The Broadcast Bridge.

Machine Learning (ML) For Broadcasters: Part 9 - Automating Workflows

You might also like...

Microphones: Part 11 - The State Of The Art… And The Potential Of MEMS Microphone Arrays

Here we look from the state of the art in microphones, to what the future may bring with the enticing theoretical potential of microphone arrays built using MEMS technology.

Microphones: Part 10 - Mid-Side (M-S) Recording And Processing

M-S techniques provide useful sound-field positioning and a convenient way to check mono compatibility. We explain the hard science behind this often misunderstood technique.

Microphones: Part 9 - The Science Of Stereo Capture & Reproduction

Here we look at the science of using a matched pair of microphones positioned as a coincident pair to capture stereo sound images.

Microphones: Part 8 - Audio Vectorscopes

The audio vectorscope is an excellent tool for assuring quality in stereo sound production, because it makes the virtual sound image visible in the same way that a television vectorscope allows the color signals to be seen.

Microphones: Part 7 - Microphones For Stereophony

Once the basic requirements for reproducing sound were in place, the most significant next step was to reproduce to some extent the spatial attributes of sound. Stereophony, using two channels, was the first successful system.