The unique Sky News “Who’s Who” app was used live and on-demand by 800,000 users in 200 countries to help identify wedding guests on screen.
The technical advantages of software-centric video processing/production architectures have been touted for some time now, but for the risk-takers, it turns out the biggest benefit for content distributors could be financial in nature.
As broadcasters struggle to stay competitive in a multi-platform world, they are being challenged to come up with new and innovative content, both for linear and online viewing, which attracts and grows audiences. Higher ratings translate to revenue and at the end of the day that’s what the television industry is all about.
The creative minds at Sky News in the UK understand this as well as anyone and continually come up with new programming elements—3D virtual sets and augmented reality graphics are two of the latest program enhancements—to satisfy their loyal subscribers. Innovation has become the network’s mantra and serving new technology up in different ways to support its programs has led to success.
However, innovation comes at a price, of which management is all too aware. That’s where an IP-based video processing solution comes into play. By hosting much of the signal processing and storage in the cloud, programmers can take risks without the traditional concern of a hefty bill at the end of the day.
The Royal Experiment
When Sky News decided to augment its live coverage of the Royal Wedding on May 19th of this year, it needed to be something special. The new feature was a complementary online application to the TV coverage that would help viewers follow along with the proceedings and identify those in attendance, in real time. Many said it was a good idea but would be prohibitively expensive. And the real-time recognition of identifying people in a live video feed with the click of a mouse had never been proven on such a public stage.
Hugh Westbrook, Senior Product Owner for Sky News and Sky Sports, knew better. As a veteran online journalist with a keen knowledge of technology, he had worked with Amazon Web Services (AWS) on a few projects and saw its potential for scalability (increasing service use when traffic demand was high and scaling back when it wasn’t) and drawing a viewing crowd. They could do it, he reckoned, if the planning was carefully thought out and the technology worked as it was supposed to. Yet, getting started in February, they only had 78 days from inception of the idea to supporting the actual event live so it wouldn't be easy.
During the Royal Wedding coverage, a Sky News team had 40 seconds to cross check the accuracy of the identifications against the results from the AWS Rekognition AI search engine. Click to enlarge.
The goal, he said, was to give the online audience a new way of consuming the Royal Wedding content, something Sky News strives for with most of its shows, both live and on demand.
Using artificial intelligence to handle facial recognition and real-time data insertion, the Sky News "Who’s Who" Royal Wedding app was born. As guests walked into the Windsor Castle church in southern England, little name tags popped up on the Sky News screen online whenever a celebrity appeared. The app allowed viewers to watch live video of the guests arriving at the church, click on a face in the video and have that person’s related biographical information about their connection to the royal couple displayed within a few seconds.
The Risk Pays Off Big
On the day of the wedding the app was a “smashing success,” as the English like to say, with over 800,000 people from over 200 countries using it both live throughout the two-hour event and later on demand. Users were also able to look back through the video once the arrivals were over to find out about the guest list in detail, or to watch key arrivals again. Several hundred thousand more have used it in an on-demand environment.
“We felt there was an opportunity there to not only create something really bold and innovative in terms of technology,” Westbrook said. “At Sky News we like to make big, bold statements on large events. We also knew there was a lot of interest in the wedding guests and who was going to be there, so this app made a lot of sense.”
To make it work financially, Westbrook sought out AWS to leverage Amazon’s software-based micro service architecture. The menu of options enables a user to pick and choose from among 120 different cloud-based services, specifically AWS CloudFront (a content delivery network), Rekognition (a video and image analysis service), EC2 (cloud-based computer processing) and Amazon's S3 (cloud storage) service. He paired these remote services with two AWS partner companies: Gray Meta (data analysis) and UI Centric (which built the app’s consumer-facing front end).
The AWS workflow stretched from the live video pool feed to the end user, with multiple cloud-based services and data analysis technology on the ground employed in synchronization with each other. Click to enlarge.
The Gray Meta data analysis platform ingested the results from Amazon Rekognition and then, using metadata, married a name to interesting information about that person. Related biographical text was tagged to an image only if it showed the highest degree of accuracy. It then delivered it downstream into the UI Centric interface so that it could all be brought together with the live video stream itself and on to consumers.
Cloud-Based Signal Processing
The way it worked was that live video feed from a broadcasters’ pool camera and data came into Sky News headquarters in London and was sent directly up to the EC2 cloud service. There the feed was segmented into eight-second video clips of video that could be quickly and easily processed by the Amazon Rekognition system. The Gray Meta team then added their data.
Within seconds the Rekognition system processed consumer demands for a particular celebrity ID and sent back results on who it thought the person in question was. A team of operator was then tasked with confirming the ID of that person and sending it along for availability via the UI Central app, or the human operator had to replace the unwanted image and text with the correct one.
Once confirmed, the correct image was sent back up into the cloud and, using metadata, the AWS services architecture stitched together the video with the hyperlinked image and related text using timecode and X/Y coordinates, and then sent out to the viewers who clicked on that specific person in the live video. This happened many hundreds of times simultaneously for each of the many celebrities in attendance.
In total there were five operators handling the Rekognition oversight. There were also one supervisor and about ten people on the technical side to make things were working as planned. These Sky News team members worked in advance to develop and train custom machine learning models for Amazon Rekognition, and to prevent false positives. This was critical because during the wedding coverage they had 40 seconds to check the results from the cloud and then push it live. To streamline this process, the team created special tags to confirm that an image could go live online, which they would apply to multiple stored reference images on each person to ensure a successful match against the computer results.
“One of the very early discussions we had was, how are going to ensure we don’t put out incorrect information?” Westbrook said. “But we couldn’t do it without machines. So there was a balance we had to find between trusting the computer and trusting our own knowledge of the people involved.”
Humans Teaching Machines
This point about teaching the machines to learn the images was key to the project's success. Some people might believe that AI is a panacea for all types of video processing, but that’s not the case—especially with live events like this one.
The challenge for everyone involved was that all of this computing had to happen in real time. Consumers don't have the patience to wait for things to happen on screen that they demand. So, the Sky news team conducted several test images through the system before the actual event and carefully taught the computer what images to look for when a specific celebrity image was requested.
AI-driven, cloud-based machines helped computer and mobile device users find out more about a celebrity just by clicking on the on-screen name tag. Click to enlarge.
“The key is you have to learn how to use technology in combination with humans,” Westbrook said. “If you get that right, then you will be successful in doing real0time projects like this. AI/Machine learning enables you to do things you could never do, but if you rely on the AI 100 percent, you are going to have issues. So you need the see AI as your friend and work in harmony with it. I don’t know how far away we are from eliminating humans from the process, but we are clearly not there yet.”
Low CapEx, Big Return
In the end the system wasn’t perfect, but it worked exceedingly well and was positively received internally and universally acclaimed by the public. With more than 200 identifications to sift through, Sky News achieved 70 percent accuracy identifying people in the wedding video during the live event and 80 percent accuracy during later on-demand viewing.
And the price tag for the big event: the cost of the AWS services alone was under $20,000. This for a system that traditionally would have required $150,000 worth of traditional equipment and a massive team of human operators to make it work. For Sky News, it was a pretty good return on investment.
“The benefit of AWS services is that they can be integrated into a single workflow very easily,” said Keith Wymbs, Chief Marketing Officer at AWS Elemental, one of the AWS services used for signal processing and image packaging. “So, the customer can pay as they go and use only the services they need at that particular time. The cost metric is based on the type of service employed. Compression format, storage requirements and data rates all figure into the pricing. Sky News was smart about how they used our platform and it was a huge success for everyone involved.”
The Royal Wedding “Who’s Who” application became a really new and unusual way to watch the guests arrive. It would have been impossible for a team of humans to sit there with sheets of paper trying to figure who was who. The AI-driven, cloud-based machines were the key. There are now on-going internal discussions at Sky about how to use the technology again for another major news, governmental or sporting event.
“It takes a big commitment of time and resources to do something like this correctly, but its also interesting to think about what can we do and get more on-going value,” Westbrook said. “So, [the technology] could be useful for events that happen more regularly, like sporting events or political events. As broadcasters it’s important that we try new things. This was a perfect blending of technology and storytelling. That’s what’s most important to Sky and our viewers. We’ll surely see more interesting and complementary programming created this way going forward.”
You might also like...
Today’s broadcast engineers face a unique challenge, one that is likely unfamiliar to these professionals. The challenge is to design, build and operate IP-centric solutions for video and audio content.
Broadcasting used to be simple. It required one TV station sending one signal to multiple viewers. Everyone received the same imagery at the same time. That was easy.
Are you an IT engineer having trouble figuring out why the phones, computers and printer systems work but the networked video doesn’t? Or maybe you have 10-15 years of experience with video production equipment but really don’t understand why…
As broadcasters migrate to IP, the spotlight is focusing more and more on IT infrastructure. Quietly in the background, IT has been making unprecedented progress in infrastructure design to deliver low latency high-speed networks, and new highly adaptable business models,…
In principle, IP systems for broadcasting should not differ from those for IT. However, as we have seen in the previous nineteen articles in this series, reliably distributing video and audio is highly reliant on accurate timing. In this article,…