Generative AI & Content Recommendation

Many major broadcasters are at least investigating how Generative AI can enhance content recommendation, partly by exploiting the ability to combine attributes across video, audio, text and graphics. These are early days though, with research still being conducted by universities and other academic institutions, as well as broadcasters themselves.
Generative AI is being embraced by broadcasters across their workflow in various domains, often to enhance or improve established processes. That applies very much to media content recommendation, which has fundamentally progressed little since the first deployments of collaborative filtering in the early days of eCommerce.
That was popularized by Amazon almost before it was deployed by broadcasters and pay TV operators, but despite increasing sophistication has run into scalability problems with proliferation of users and especially content. Data sets have become too sparse to be accurate in all situations. Concentration around small subsets of the content portfolio has tended to skew recommendations towards the most popular content, rather than programming that might engage individual users more.
Neural network-based machine learning has been employed alongside collaborative filtering, but the marriage has not been that successful. The potential nuance and granularity of machine learning is hard to obtain when yoked to collaborative filtering, and results have been mostly underwhelming.
Collaborative filtering tends to pivot around users and their viewing preferences. It takes the rather simplistic view that if two people like one type of content they will also share preferences for other types, even in unrelated genres. That is sometimes called user-based filtering.
The filtering can also focus on the content itself, on the basis that viewers tend to coalesce around related content types. People who enjoy watching tennis might also be interested in squash or handball, for example. Also called content filtering, this has often worked best when there is little data on the user’s preferences, personality or behavior, but plenty on the item itself. When there is data on both, user and content-based filtering can be combined, but even then, it had reached the point of diminishing returns on further development until AI and then Gen AI came along.
There is no radical technical difference between Gen AI and what is sometimes referred to as traditional AI. Both are based on neural networks to varying degrees of depth, applying machine learning to neural networks of varying depths to converge on patterns within large data sets for recognition, diagnosis or prediction.
The main difference is that Gen AI incorporates methods that open out the process to create new data points, which could be in related but distinct domains. This brings the multimodal capability, or cross-modal, where input in one mode can be converted into output in another mode so long as there is some logical correspondence between them.
This greatly extends the potential capability of recommendation, which traditionally has been confined to single data types at a time, like a user’s click history, or text metadata about content, which imposes a limit on the level of granularity or nuance that can be reached.
Multimodal AI enhances content recommendation systems by enabling multiple data types to be fused together so that their combined impact or weighting can be assessed, with scope for tuning to improve results. This includes all the modalities of the content, that is video, image, audio, and graphics, but is not limited to that. It can also incorporate behavior and known preferences.
We are now seeing, at least in trials, application of multimodal AI to streaming platforms, where say thumbnails of video are analyzed alongside viewing patterns and textual dialogue transcripts. The theory is recommendations are then more likely to be of interest for the viewer by integrating all the relevant measures, which would be impossible for a legacy unimodal system.
This integration is accomplished technically by converting the various modality measures into numbers, which are then embedded into a unified model. Each modality can be allocated weightings, which can be adjusted to improve results just as in other applications of machine learning.
Although relatively new to broadcasters and streaming video providers, the tools and methods of multimodal recommendation were first deployed a bit earlier in some other fields. Collaborative filtering percolated into broadcasting from eCommerce and the same is happening with Gen AI based multimodal recommendation.
The deployments in other domains have also provided earlier evidence of success. Various e-commerce sites having already shown that combining images and descriptions of products with customer interaction data, such as time spent by users hovering over items, the accuracy of preference prediction is higher than when that is based on one of the data sources on its own.
Some of the underlying AI techniques have also been proven in other fields, such as medical diagnostic imaging. This is the case for one of the key methods called Residual Neural Networks (ResNets), which when applied to deep neural networks has enabled deeper more granular analysis of images and by extension video sequences.
Still only about 10 years old, ResNet was developed almost specifically to avoid various problems associated with training deep neural network models comprising large numbers of layers to analyze images and identify features or objects within them. The essence is that in training it allows layers to be skipped if they are causing problems. It allows one layer to perform a jump or skip and add its output to the input of a layer several levels away rather than just the next one. This then forms residual blocks, which themselves then become part of the training architecture.
While a detailed technical description is beyond our scope here, there are plenty of online sources for that. The main point is that it allows fine grained nuanced analysis of images and identification of small features which may be nested within larger objects. The latter can be vital in medical diagnostics. It is one reason why AI based medical imaging can now often exceed the accuracy of human experts.
There is potential for exploiting these capabilities in video content recommendation, although at this stage it is too early to assess much beyond the potential in many cases. Yet many leading broadcasters are now engaged in research and trials of advanced AI based recommendation.
Perhaps not surprisingly given its R&D pedigree, the BBC is among the evangelists. Only this month, April 2025, the BBC’s R&D Director Jatin Aythora was blogging about key principles focusing technical direction over the next few years, one being that all user interaction will be driven by AI. “As AI technologies continue to develop, they will drive personalized and contextualized user interactions across news and media platforms,” wrote Aythora. “From content recommendations to real-time language translation and automated story generation, AI will become central to shaping the user experience.”
One deficit for free to air public service broadcasters such as the BBC can be lack of detailed knowledge of user history, given that until recently at least they have lacked direct interactive links with users. So, one aspect of Gen AI based recommendation of interest for their portals is the ability to handle what are sometimes referred to as zero-shot cases.
These are where there is plenty of knowledge about the content but not the users. Handling these involves fluid transformations of information between domains that may appear unrelated, such that content can be matched with users upon minimal information, tweaking the process rapidly as more data comes in.
Essentially the machine learning model is initially prompted to generate a response without being given any example of the desired output for the application, in this case a recommendation for a given user. Nonetheless, recommendations can still be expected to gain accuracy as more information about users is accumulated. Indeed, there is great scope with Gen AI to derive valuable insights from browsing histories, which is a focus for Italy’s national broadcaster Rai.
Working with a commercial vendor developing personalized customer experience software, it has developed a content recommendation system exploiting browsing history of subscribers to its RaiPlay streaming platform. The aim is to offer even more highly personalized content with increased engagement. This entailed integration of the resulting Gen AI-based recommendation engine into RAI’s ecosystem, which the broadcaster said has opened out its archives of historical content of 6,200 titles, including movies, Italian and international series, documentaries, sport events, programs, and children’s content.
This for the first time has allowed Rai to offer tailored content down to each user, recommended in real time. It also generates new metadata automatically, which is where the Gen AI module comes in, allowing users to browse and discover more relevant content on their own. Then there is the ability to combine the recommendation with content curated as part of the editorial process.
Most importantly, more advanced analytics has been engaged to measure the impact of this personalization on viewing time and user engagement, in order to prove that the investment was worth it. This has already yielded proven improvements in engagement, according to Guido Porro, Rai’s Executive Vice President, Engineering.
While perhaps not as advanced as Rai, France Televisions is another pubcaster seeking to exploit Gen AI across its workflow, including content recommendation. Again, this month, Christophe de Vallambras, head of the broadcaster’s MediaLab de l'information, explained it had co-authored a review of challenges and applications of Generative AI in TV journalism.
“The aim is to establish, for the first time, a common basis for reflection among public broadcasting newsrooms,” said de Vallambras. The document was designed with contributions and review by the editorial and digital teams of France Televisions, Radio France, France Médias Monde, TV5Monde and INA (Institut national de l’audiovisuel), which has archived output of French TV channels almost since World War 2.
To some extent Gen AI based recommendation boils down to hyper personalization, which has been described as one of the seven great use case categories of AI. The essence is to drill deeper into both users and content with far more variables across multiple dimensions, giving scope for machine learning to converge on optimal strategies for personalization, search, discovery and recommendation.
This transforms relatively crude sets of profiles into individual assessments that can enable better decision making across the board, in this case primarily generation of tailored recommendations. Again, the principle has been proven to greater degrees in other fields, sometimes seemingly unrelated ones.
The example of financial credit checking springs to mind. Traditionally this has relied on credit scoring, with individuals assigned to different levels for decisions such as whether to approve a bank loan, leasing contract or mortgage. It has now been found that a more nuanced approach, taking account of factors beyond basic past credit history, has led to better decisions on both sides.
Sometimes people who would have been granted loans under the old system have been refused. Yet equally, reflecting at least partly perhaps changing circumstances beyond the radar of legacy credit scoring, people who would have been denied access to such financial services in the past are now being granted them, boosting revenues for the provider without increasing risk.
There are signs of comparable bottom-line benefits for broadcasters and video service providers from Gen AI based recommendation, but it is too early yet in most cases to determine just how much this will improve on traditional methods. That will take more analytics.
You might also like...
BEIT Conference Sessions At NAB 2025 Report - Part 1
The BEITC conference sessions at the 2025 NAB Show were varied and fascinating. Here our transmission specialist Ned Soseman summarizes his pick of the sessions of most relevance to the US broadcast market.
Authenticity And Trust In Media
Our resident digital philosopher Dave Shapton asks us all to consider whether we know what is real and how much we value authenticity.
Monitoring & Compliance In Broadcast: File Based Monitoring In Production Systems
File based monitoring tools sit at the heart of broadcast workflow. As production requirements evolve to embrace remote production and multi-site teams, such systems must also evolve to meet the new challenges.
Content Steering Goes Mainstream After Standardization
Tests have confirmed that content steering will boost performance and resilience of multi-CDN delivery networks. Following standardization by the DASH Industry Forum and then ETSI, it is becoming integral to streaming infrastructures, working autonomously and upgraded transparently in the field.
Microphones: Part 10 - Mid-Side (M-S) Recording And Processing
M-S techniques provide useful sound-field positioning and a convenient way to check mono compatibility. We explain the hard science behind this often misunderstood technique.