An overhead automated camera tracks the field action based on deep-learning algorithms.
Deep learning technology is more common than one might think. This technology is used to identify objects in images, texts or audio, achieving results that were not possible before. This article will examine how deep learning is revolutionizing sports production to enable low-cost, fully automated production for semi-professional and amateur sports broadcasts.
To understand how deep learning works, let's examine how our brains work. A human brain is made up of nerve cells, called "neurons," which are connected in adjacent layers to each other, forming an elaborate "neural network." In an artificial neural network, signals also travel between "neurons.” Instead of firing an electrical signal, a neural network assigns "weights" to various neurons.
Deep learning neural networks comprise as many as 150 connected layers. The more layers developed, the “deeper” the network. Deep learning models are trained by using large sets of labeled or annotated data. The neural network architectures learn features directly from the data, so you do not need to identify the features used to classify images. The relevant features are not pretrained either; they are learned while the network trains on a collection of images. This automated feature extraction makes deep learning models highly accurate for computer vision tasks such as object classification.
Although there is no need to manually extract each feature, there is a need to create a large enough training data set with annotations. So, for example, to identify a ball, you will need a data set of hundreds of thousands of unique images, which are annotated by humans and present the "ground truth" for the deep learning model. If you consider the fact that you would usually annotate other elements, such as players, this can add up to millions of annotations. The result is a "trained model" that can identify the objects it was trained on.
Deep Learning in Sports Production
Deep learning is used to generate fully automated sports production that looks very similar to professional sports broadcast, including camera zoom ins on the action, panning, etc. The basis for any decent-level automated sports production is the ability to at least identify the ball and the players. Identifying the ball is not an easy task, if you consider the fact that the ball can be on the ground and sometimes held by a goal keeper or a player (e.g. before kicking a foul).
Deep learning technologies enable software to identify all of the required elements of a sports broadcast to automate its live production.
If you think about it, in all these different situations the ball "looks" different, yet, we, as humans, have no problem identifying it as ball from a single frame. Identifying the players is not simple either, as the system will have to distinguish between "real" players and referees, bench players, etc.
Identifying the Field/Court
In sports production, one of the ways used to help identify the ball and the players is to define to the system the area that constitutes the field/court. This process -- "calibration" -- limits the scope of options for the DL algorithm by establishing within each frame which pixels are part of the field and court and which ones are not. It then translates these pixels to physical dimensions based on real-world coordinates.
By establishing the area of the field/court, it is possible to distinguish between players who are inside the field/court versus others outside of it, such as bench players, and between players on the field and spectators, who are outside the field.
Data Annotation for Sports
As mentioned above, as part of the deep learning model training is a need for a large data set to establish the "ground truth" for the deep learning algorithm. This is a major undertaking that should be done on an ongoing basis as more data is gathered and the algorithm evolves.
There are several options to achieve this. A minimal number of frames must be annotated by humans. In addition, several methods that require less effort, including:
- Google/YouTube images - it is possible to augment the data set by searching "soccer players" on Google or YouTube. This will yield frames or images that include soccer players, or, in other words, have been "pre-annotated" as soccer players.
- Unsupervised learning – this technique uses un-labeled data by applying an additional non-deep-learning algorithm to first segment the area of the potential players. For example, we can use known background subtractors such as MOG to roughly identify players.
- Augmentations – another commonly used technique is to modify or augment the images, for example to stretch them, modify angles, etc. These augmentations produce an additional data set that has been already labeled.
One key to proper camera tracking is for the system to recognize the area of the field or court. The software must distinguish between players who are inside the field/court versus others outside of it.
As we've seen with deep learning technologies, computers can understand the sports action, opening new opportunities in sports production that were never possible before. In its highest level, this technology can mimic the decision-making process of a human camera operator and video editor, providing almost the same experience of a professional live sports broadcast, at a fraction of the cost. This technological revolution will allow semi-professional and amateur sport clubs to broadcast the games to their fans and even monetize their content.
Yoav Liberman is Director of Computer Vision & Deep Learning Algorithms at Pixellot.
You might also like...
Moving to IP opens a whole plethora of options for broadcasters. Engineers often speak of the advantages of scalability and flexibility in IP systems. But IP systems take on many flavors, from on-prem to off-prem, private and public cloud. And…
Part one of this four-part series introduces immersive audio, the terminology used, the standards adopted, and the key principles that make it work.
Philo T. Farnsworth was the original TV pioneer. When he transmitted the first picture from a camera to a receiver in another room in 1927, he exclaimed to technicians helping him, “There you are – electronic television!” What’s never been quoted but lik…
Ground breaking advances in storage technology are paving the way to empower broadcasters to fully utilize IT storage systems. Taking advantage of state-of-the-art machine learning techniques, IT innovators now deliver storage systems that are more resilient, flexible, and reliable than…
This FREE to download eBook is likely to become the reference document you keep close at hand, because, if, like many, you are tasked with Preparing for Broadcast IP Infrastructures. Supported by Riedel, this near 100 pages of in-depth guides, illustrations,…