Test, QC & Monitoring Global Viewpoint – November 2020
ML Confirmation Bias

Key to effective ML (Machine Learning) prediction is the accuracy of the data used to train the systems. But what is this training data and what happens when it’s not as accurate as we would like to believe?
Library search systems are an excellent example of applications that have benefited greatly from ML. Generally speaking, the better the classification metadata used for search, then the more likely it is for editors and producers to be able to retrieve the shots they want.
With the introduction of ML, the speed and accuracy with which footage can be classified is breathtaking. With the right data sets there’s no need for a human to get involved with the classification at all, but the quality and accuracy of the classification is a direct consequence of the accuracy of the training data used.
Somebody somewhere is responsible for training the ML system. Training fundamentally involves sitting through hours of material to find sample images of the item in the video that is to be detected. This provides the source-of-truth training data which in turn will provide the metadata for the final library classification product.
Confirmation bias is a well-known human phenomenon and has been the subject of much research over the past sixty years. As humans we have a tendency to search for, recall and interpret information in a way that seems to confirm our existing beliefs. In other words, people tend to look and believe what is true to them. Intuitively this is correct, after all, who likes being wrong?
Going back to our library search system, during the training process, humans are providing a source of truth that is in part influenced by their own confirmation bias. I’ve no idea how much the results are influenced but I believe left unchecked it could be a serious issue.
For example, if somebody was tasked with classifying all scenes for political parties, and they supported a particular group, then we might find that some political parties were underrepresented in the training dataset, which in turn would lead to a biased search. This isn’t a criticism of the person providing the classification function, but merely an observation of human nature.
One solution is to have many people from all different walks of life classifying the same data so that hopefully the law of large numbers would come into effect and average out any bias.
ML systems using supervised learning such as image classifiers are only as accurate as the data used to train them. This concerns me greatly as I believe that systems that are based on subjective analysis, such as image classification, are more susceptible to confirmation bias. Consequently, we need to scrutinize the training data as much as the finished product.
I would like to think that as products mature then vendors will re-train their ML solutions and provide new parameters for their models to improve their accuracy. A form of constant improvement. But how often will this happen and who is responsible for maintaining the quality of these products within a broadcast facility? How do we check them and who guards the guard? Especially as ML moves further into the program chain.
This is a fantastic opportunity for vendors to demonstrate to broadcasters the quality of their ML products. But I think broadcast engineers need to dig deep into the training datasets and start to ask some probing questions of them.