How “Deep Learning” Technology is Revolutionizing Sports Production

Deep learning technology is more common than one might think. This technology is used to identify objects in images, texts or audio, achieving results that were not possible before. This article will examine how deep learning is revolutionizing sports production to enable low-cost, fully automated production for semi-professional and amateur sports broadcasts.

To understand how deep learning works, let's examine how our brains work. A human brain is made up of nerve cells, called "neurons," which are connected in adjacent layers to each other, forming an elaborate "neural network." In an artificial neural network, signals also travel between "neurons.” Instead of firing an electrical signal, a neural network assigns "weights" to various neurons.

Deep learning neural networks comprise as many as 150 connected layers. The more layers developed, the “deeper” the network. Deep learning models are trained by using large sets of labeled or annotated data. The neural network architectures learn features directly from the data, so you do not need to identify the features used to classify images. The relevant features are not pretrained either; they are learned while the network trains on a collection of images. This automated feature extraction makes deep learning models highly accurate for computer vision tasks such as object classification.

Although there is no need to manually extract each feature, there is a need to create a large enough training data set with annotations. So, for example, to identify a ball, you will need a data set of hundreds of thousands of unique images, which are annotated by humans and present the "ground truth" for the deep learning model. If you consider the fact that you would usually annotate other elements, such as players, this can add up to millions of annotations. The result is a "trained model" that can identify the objects it was trained on.

Deep Learning in Sports Production

Deep learning is used to generate fully automated sports production that looks very similar to professional sports broadcast, including camera zoom ins on the action, panning, etc. The basis for any decent-level automated sports production is the ability to at least identify the ball and the players. Identifying the ball is not an easy task, if you consider the fact that the ball can be on the ground and sometimes held by a goal keeper or a player (e.g. before kicking a foul).

Deep learning technologies enable software to identify all of the required elements of a sports broadcast to automate its live production.

Deep learning technologies enable software to identify all of the required elements of a sports broadcast to automate its live production.

If you think about it, in all these different situations the ball "looks" different, yet, we, as humans, have no problem identifying it as ball from a single frame. Identifying the players is not simple either, as the system will have to distinguish between "real" players and referees, bench players, etc.

Identifying the Field/Court

In sports production, one of the ways used to help identify the ball and the players is to define to the system the area that constitutes the field/court. This process -- "calibration" -- limits the scope of options for the DL algorithm by establishing within each frame which pixels are part of the field and court and which ones are not. It then translates these pixels to physical dimensions based on real-world coordinates.

By establishing the area of the field/court, it is possible to distinguish between players who are inside the field/court versus others outside of it, such as bench players, and between players on the field and spectators, who are outside the field.

Data Annotation for Sports

As mentioned above, as part of the deep learning model training is a need for a large data set to establish the "ground truth" for the deep learning algorithm. This is a major undertaking that should be done on an ongoing basis as more data is gathered and the algorithm evolves.

There are several options to achieve this. A minimal number of frames must be annotated by humans. In addition, several methods that require less effort, including:

  • Google/YouTube images - it is possible to augment the data set by searching "soccer players" on Google or YouTube. This will yield frames or images that include soccer players, or, in other words, have been "pre-annotated" as soccer players.
  • Unsupervised learning – this technique uses un-labeled data by applying an additional non-deep-learning algorithm to first segment the area of the potential players. For example, we can use known background subtractors such as MOG to roughly identify players.
  • Augmentations – another commonly used technique is to modify or augment the images, for example to stretch them, modify angles, etc. These augmentations produce an additional data set that has been already labeled. 
One key to proper camera tracking is for the system to recognize the area of the field or court.  The software must distinguish between players who are inside the field/court versus others outside of it.

One key to proper camera tracking is for the system to recognize the area of the field or court. The software must distinguish between players who are inside the field/court versus others outside of it.

As we've seen with deep learning technologies, computers can understand the sports action, opening new opportunities in sports production that were never possible before. In its highest level, this technology can mimic the decision-making process of a human camera operator and video editor, providing almost the same experience of a professional live sports broadcast, at a fraction of the cost. This technological revolution will allow semi-professional and amateur sport clubs to broadcast the games to their fans and even monetize their content.

Yoav Liberman is Director of Computer Vision & Deep Learning Algorithms at Pixellot.

Yoav Liberman is Director of Computer Vision & Deep Learning Algorithms at Pixellot.

Let us know what you think…

Log-in or Register for free to post comments…

You might also like...

The Benefits Of Orchestration

The adage, “Do more with less,” has been with us for decades. And when it comes to producing video content, modern technology actually makes that possible. Capturing a viewer’s interest requires creative talent. But today’s technology makes the process…

Articles You May Have Missed – May 23, 2018

One could not come away from NAB 2018 without seeing some form of artificial intelligence (AI) being demonstrated in many exhibitor booths. AI was being touted as the perfect solution to practically every application, be it storage or production. Yet, for…

Articles You May Have Missed – March 14, 2018

Two newer technologies are developing that may affect broadcasters, 5G cellular delivery and artificial intelligence (AI). Some experts believe that 5G may develop into a competent OTA program delivery system. Others see 5G as merely another step in boosting cellular…

Report: Ooyala, Metadata: The Future of Video from Concept to Consumer

The industry’s realization of the importance and value of metadata has been growing: The recent Pay-TV Innovation Forum 2017 survey from NAGRA and MTM found that the majority of pay-TV executives believed data and analytics would be crucial to the d…

Essential Guide: Migrating to IP

The business case for migrating to IP is compelling and driven by the needs of business owners. Broadcast engineers must rise to the challenge and if they are to deliver reliable IP infrastructures they must understand not only the technology,…