Deep learning is a technique for machine learning, one way to achieve artificial intelligence.
Many new software applications now claim Artificial Intelligence (AI) or machine learning as their underlying power. Is this just marketing hype, or is AI going to be really useful technology for the media and entertainment sector?
AI is not new, it’s been around for half a century or so, but it is only in the last couple of years that the term has been cropping up in product descriptions. So, what is it and what potential does it have for this sector?
To the public, AI used to be the power behind the man-versus-machine board games, with IBM’s Deep Blue winning at chess against a grand master in 1996 and more recently Google AlphaGo winning a series of games of Go in 2015. The development of genuinely useful applications only emerged when advances in technology made the necessary computing power affordable. The autonomous vehicle is one big driver for this technology, but many other industry verticals are beginning to see opportunities and potential benefits. AI is to be found throughout the online advertising platforms, it’s already here. So, what can AI offer the media and entertainment sector (M&E)? For a sector that prides itself on creativity, it may seem an anathema to automate processes with smart machines. There many areas of the media business that can benefit from AI.
One problem can be hacking through the hype. Marketing departments adopt buzzwords, “cloud’, ‘workflow’, and now ‘AI’, and apply them to all and everything until they cease to have real meaning.
There is no single definition for AI, which is an advantage to the marketeers, but for the technology buyer, a reality check is essential when faced with AI solutions. The term ‘vaporware’ has long been associated with software, and it doesn’t go away with AI.
The computer scientists Alan Turing started philosophising over intelligent machines in 1950. He devised a test to determine if a computer could respond to natural language questions in a manner indistinguishable from a human, known as the Turing Test. The first definition of AI came from John McCarthy, Professor, Dartmouth College, NH, in 1956:
"every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it”.
The world view was very different back then in the infancy of computers. After a promising start, AI languished through the 1970s, but looking back now, we can see that the computers were lacking the necessary cost/performance ratio to make AI a viable proposition. Now we have multi-core CPUs and graphics processors (GPU) with a power not dreamt of in the 1970s. This processing power is at a low enough cost to embed in devices as small as phones—the landscape has changed.
A more recent definition of AI from the Association for the Advancement of Artificial Intelligence reads
"the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."
AI, Machine Learning and Deep Learning
A quick dive into AI and you will come across two more terms: Machine Learning and Deep Learning. The three terms are related, but each has a specific meaning.
AI is the overall concept. Machine learning is one way, currently the most successful, of applying AI to a task. Other techniques include the classic if … then… code of rules engines. This approach can be found in many software applications, like ad scheduling and business process management (BPM).
Machine learning is a newer approach to AI that does not use explicit programming, but instead learns by experience as it performs a task. So, in the example of optical character recognition, the algorithm improves the performance of the recognition by learning each times a character is analysed—learning by experience. In complex tasks this can lead to the need to analyze very large data sets to achieve the desired performance, the world of big data.
Machine learning find application in many tasks including:
- Speech recognition
- Natural language processing
- Computer vision
- Expert Systems
- Heuristic classification
Many of these find immediate application in the M&E sector for tasks from automated production through to content localization into different languages. Much of the time-consuming, mundane work in production and distribution involves reviewing hundreds of hours of content. This is where services like speech recognition, natural language processing, and computer vision have immediate application.
Deep learning is a way to implement machine learning, often based on multiple layers of artificial neural networks operating in a manner not unlike the human brain. To quote from a 2015 paper in ‘Nature’ by Yann LeCun, Yoshua Bengio and Geoffrey Hinton:
Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction.
The ‘Deep’ refers to the multiple layers of processing. Deep learning has led to a great improvement in the performance of many tasks including speech recognition, speech to text conversion and object recognition. Machines can now even describe what were once consider emotional interpretations like the mood of an image.
The on-going research is creating platforms that are lower cost and faster, both prerequisites for autonomous vehicles. Graphics Processing Units (GPU) are the current favorite platform for deep learning machines, but they are not the only solution. FPGAs are proving a popular route, designed specifically for the arithmetic operation needed by neural networks. Google has even designed an ASIC for machine learning.
You don’t have to use on-premise solutions, AWS, Google Cloud and Microsoft Azure all offer AI and machine learning services ‘in the cloud’.
Applications for M&E
AI can be applied throughout the acquisition to delivery pipeline. Possible applications include automated production, through editing, to the preparation of content for different markets.
Interest in automated production started in the educational market for lecture capture over ten years ago and is now expanding to broadcast applications like news and sport. Several companies are applying AI to camera robotics and video switching to frame and select shots. Such systems could find applications in local news and minority sports, where budgets are restricted, and automation could allow production with fewer people. One example seen at NAB was Mobile Viewpoint. Their NewsPilot controls three PTZ cameras, a switcher and a graphics engine for intelligent, unmanned newscasts. IQ Sports Producer, also from Mobile ViewPoint, uses special developed algorithms from the Netherlands Organisation for Applied Scientific Research (TNO) to detect objects, persons and behavior and use that information to track the game and control the pan and zoom of a 180-degree camera.
IBM Watson Media are offering a sports highlights service that helps sports broadcasters by automatically identifying and curating video highlights. Using dynamic rules, the application finds highlights based on video analysis.
Video editing can be onerous, especially for observational documentaries, where shooting ratio of 100:1 or more call for much reviewing and sorting to arrive at a rough cut. AI shows promise as an aid to editing. As an example, a team at Stanford University, along with Adobe Research, published a paper in 2017 entitled “Computational Video Editing for Dialogue-Driven Scenes”. The researchers took a standard film script and multiple video takes, each capturing a different camera framing or performance of the complete scene. The system automatically selects the most appropriate clip from one of the input takes for each line of dialogue, based on a user-specified set of film-editing idioms. These could be keep the speaker visible, intensify emotions or emphasize character as examples. The sequence can be then tweaked by a human editor, having been relieved of the laborious task of shot assembly.
Content-aware Encoding (CAE) is being used to boost the efficiency of AVC/H/264 for businesses that are wary of the licensing issues around HEVC. CAE can be implemented through machine learning. Rather than a fixed set of coding parameters, the encoder optimizes the settings to minimize the compressed data rate. CAE has been adopted by standalone encoder vendors, as well as the online video platforms.
IBM has moved on from Deep Blue, with IBM Watson Media offering a suite of AI applications for the M&E sector. Two examples for the content distribution business are closed captioning and compliance checks.
Watson Captioning leverages the company’s Speech to Text API to create closed captions faster than real-time with features like automatic segmentation to improve readability. IThe new technology was used in 2016 automatically caption the US Open tennis tournament.
AI can now be pressed into service to aid compliance checks, ranging from flagging logos and trademarks that may cause intellectual property issues through to identifying adult content and profanities.
For more examples from NAB of AI in the media business see this article from Michael Grotticelli: AI Touted as Super-charged Video Assistant
AI is is finding application from acquisition all the way through to delivery. Deep learning has radically improved key technologies like video analysis and speech to text conversion. The general-purpose GPU plus the bespoke FPGAs and ASICs have made deep learning algorithms fast and low cost, in the cloud or on-premise. There is the potential to automate many of the repetitive tasks from video production and publishing, helping media companies to keep pace with the never-ending demand for video entertainment across all manner of platforms and devices. Mining program archives and publishing to the long tail are only viable propositions when costs are minimized. AI can augment human resources to change the cost base.
Current AI capabilities cannot completely replace human oversight. The algorithms are not perfect, after all they are learning. Speech-to-text will struggle with regional accents, where voices are overlapping, or if the ambient noise level is high. There have been notorious examples of image recognition failing. However, they provide a very useful adjunct to existing processes and practices.
AI was definitely a hot topic at the 2018 NAB Show, and we can expect to see AI and machine learning in more products and solutions as the advantages become evident and the business opportunities expand.
Related Editorial Content
As studio content creation and live production continues to get more demanding and complex, producers are finding that having a reliable, fast and efficient assistant is invaluable… even if that assistant is actually a series of Artificial Intelligence (AI) and m…
Attendees to Society of Motion Picture and Television Engineers (SMPTE) conference might conclude that Machine Learning (ML) and Artificial Intelligence (AI) have progressed way beyond hype as they start to enter just about every aspect of video production and distribution.
Decentrix has announced an upgraded analytics package called BIAnalytix incorporating machine learning techniques to improve optimization of audience value and pricing for both content delivery and advertising campaign management.
EuclidIQ, a video compression company based in Concord, Massachusetts, is releasing Rithm, a cloud-based content-adaptive transcoder incorporating AI (Artificial Intelligence) techniques, at NAB 2018.
NAB 2018 was an important show for Adobe Systems as they brought the power of Artificial Intelligence to their rapidly expanding post production systems.
Video streaming technology developer Bitmovin will show its latest V8 player at NAB Show 2018, with the claim it has the smallest footprint on the market. But there may be greater interest in the technology beneath the covers, with a focus…