Software Infrastructure Global Viewpoint – October 2021
Who Owns The Data?
Machine Learning (ML) is finding many new opportunities in broadcasting. From compression and library metadata tagging, all the way to image recognition for remote camera operation. One of the challenges we face in maintaining the ML momentum is implementing new, diverse, and relevant datasets. The question I have is, who owns the data?
This may seem like a philosophical question, but it’s not. It’s really a question about intellectual property. When I take a photograph, in most countries, I own the IP of the photograph. If I take a photograph of someone who can be recognized, then I may well need a model release form.
One of the interesting aspects of ML is its use in robot control. I see a time when a camera, or group of cameras can be programmed to follow a particular player around a pitch using facial recognition. Also, imagine a time when the director calls the shots, and the ML voice recognition will be able to respond and move the cameras accordingly.
Being able to detect a face is now relatively straightforward, but being able to detect a specific face, is still a bit more challenging.
To improve the cameras’ ability to track a specific player will need many images from different angles, and a wide variance of lighting for each individual player. All well and good, but wouldn’t the player have something to say about that?
This is where a firm dividing line between the world of research and commerce appears. From an academic perspective, many datasets are available from all over the world for research only. That is, they cannot be used in a commercial product. I’m not saying that they can never be used, but under the researcher’s academic license there are often clauses that specify non-commercial use only.
So where does this leave the world of commerce? To be honest, I’m not sure. I do know that an ML engine can never have too much training data, and I do know that sports people and athletes will want to protect their brands. But how do we release a massive amount of data to vendors who want to build cutting edge products using ML with recognizable data? The potential contractual issues look daunting.
It seems to me that this is all straight forward when data can be made anonymous. An example of this is the acquisition of the read-write metrics for a disk drive along with the ambient temperature and power levels in the server chassis. A vendor could easily collect lots of data for monitoring purposes and then make it untraceable. The resulting dataset being used to predict disk drive failures for the good of all their clients.
However, when we’re relying on data that cannot be made anonymous, as in the case of the images of the sports persons face, then there are some interesting challenges ahead. It seems strange to me that solving a technical problem may well rely on a model release form.