Artificial Inspiration – Debating The Implications Of Training AI To Create Images

There is growing debate over the ethical and legal implications of using millions of images drawn from the internet to train AI powered software to create ‘new’ images. It feels like the beginning of a journey which could have profound implications for the creative industries – so it is perhaps predictable that the legal battles have begun.

One of the more alarming technological predictions about the 21^st century is the potential automation of more and more fields of employment. Some qualified opinions – perhaps most famously at the Oxford Martin School – have suggested that almost half of all work in the United States might eventually be automated, a figure that sounds like a deliberate exaggeration, but isn’t.

That idea is worrying people well beyond the blue-collar fields traditionally most threatened by automation. It’s already visibly happening even in the creative industries, where people might have felt less exposed. An early skirmish in that fight is a lawsuit targeting AI developers Midjourney and Stability AI, as well as the vastly popular consumer-facing site DeviantArt, launched by three artists who claim that the use of their work as training data for a machine learning system infringes copyright. The group’s lawyer has suggested that the Stable Diffusion text-to-image processor “contains unauthorized copies of millions – and possibly billions – of copyrighted images.” In parallel Getty Images have also initiated action against Stability AI.

There are certainly a lot of ways to implement machine learning – or artificial intelligence, depending on the details – but claims that they literally rely on keeping copies of the training data seem likely to be questioned. Exactly what a system like Stable Diffusion absorbs from its training data is difficult to express. A full technical description of what’s going on in systems like Stable Diffusion is beyond the scope of this article, but a simple example of modern machine learning is reading handwritten characters, a favorite early application. The input data (the brightness of dozens of pixels) is transformed into output data (one of twenty-six characters).

That transformation is done by setting up connections between those dozens of input and output nodes via (usually) several layers of intermediate nodes, with interconnections between the layers. Some of the connections have more influence over the result than others; they’re weighted. Those weights are set such that pixels from input images of (say) a handwritten letter A will tend to activate the output node representing the letter A. The process of setting those weights is how the system learns, using many different images of known characters and adjusting the weights for the desired result. Experience shows that the system is then likely to be able to interpret previously unknown images showing handwritten characters with good accuracy.

This is a neural network, and while there are a few ways to implement software that can reasonably be described as machine learning or AI, a neural network is the prototypical example. Relating that to the court case, we now know that the things learned by the system are represented in the configuration of weighted interconnections. Analysis of that internal state is an advanced research topic. In practical situations they often behave as an impenetrable black box. It’s notoriously difficult to interpret the connection configuration of a well-trained AI. That’s why, for instance, it’s very difficult to find out why certain types of machine learning might have made a particular decision.

Given all that, it’s tricky to claim that any machine learning device can realistically be said to contain copies of an image, or to describe what it does as “collage,” as has been said. It’s just as difficult, though, to discuss what they do contain without resorting to vague allusions. Inevitably, they contain something of the distilled essence of the training data. Represented somewhere in that hard-to-interpret miasma of information is, ideally, some kind of understanding of the subject the system is intended to handle.

It’s certainly enough for an AI to duplicate the style of a particular artist, as one particular artist has complained. The AI might not contain image data, but if it contains enough information to create works that look like they were created by a specific person, it’s difficult to claim that there is nothing of that person’s work embedded in that AI. The way that happens may be complicated and poorly understood, but we can be confident it does happen. As such, the fundamental objection to this application of AI might simply be that it’s being used as a way to skirt copyright law by a process of diffusion.

The output of the AI is based on the training data – it can’t be based on anything else – but there’s a huge number of people represented in that training data, and because of the problems with determining why any AI made any particular decision, it’s very hard to associate any particular feature of a generated image with any particular part of the training data.

Crucially, this is exactly how human beings work. That’s why we use the word “intelligence” in the initialism “AI”. We are all a sum of our life experiences. The only way anyone learns how to draw pictures, compose music or form any opinion on anything at all is by experiencing what other people have done. We use words like “inspiration” to describe situations where the work of one person has influenced the work of another. Even that is fraught with court cases intended to decide whether one piece of music is too directly derivative of another. It’s hard to imagine we can make that decision for AI if we can barely do it with humans.

And even if we could, most of us probably don’t apply the same moral relativity to AIs as we would to human artists. What we’re talking about here is potentially the work of an individual, possibly part-time or professional artist posting amateur works on a user-generated content website. Having a large corporation extract the essence of that work as part of an automatic process which might become highly profitable while circumventing the artist entirely is something that instinctively seems wrong to a lot of people in a way it wouldn’t if it were a human being.

One problem is the sheer scale of what’s possible. The AI can work day and night at very low cost to flood the world with material much faster than any real competing artist could ever hope to achieve. Conversely, a human capable of competently creating something someone wants to buy has at least put time into gaining that competence. We can argue about the exact cash value of that kind of effort, but it is at least a limiting step on how much material can be published. An AI, meanwhile, can be retooled to create anything, or to duplicate any artist, or many artists, at any time, in huge volume and with few resources.

The other way to look at this is to consider not whether it’s desirable or not, but instead, ponder what might be done to prevent it. The process of exposing an AI to training data inevitably involves duplicating that data, but then again so does sending it across a network and keeping it in a browser’s cache, so whether that’s something we can reasonably control with existing copyright law is dubious.

Even if we could control it, though, the issue is whether we’d want to. Current AI research often relies on exposing new systems to databases of training information so unimaginably vast that they could barely exist without the modern internet. Attempts could be made to restrict that, but some of those systems might plausibly be capable of genuinely world-changing things and any serious restriction on doing that might damage something which has enormous potential to help solve society-level problems.

Some sort of compromise is probably needed here. Back in the more prosaic world of film and TV production, there are certainly stakes both for established content creators, and for people who want to use AI to create content. Those are already groups which cross over quite significantly, so many people will have a foot in both camps of this issue, and the morality is at least as influential as the technology.

On one hand, the DeviantArt lawsuit will inevitably be the first of many as society figures out how it will interact with ascendant AI. On the other, new technologies have frequently been heralded as much more influential than they turn out to be. As we spend the early twenty-first century struggling to clean up the various deadly messes left by the dawn of the nuclear age, there’s searing irony in a 1952 quote from the then chairman of the Atomic Energy Commission who claimed that nuclear energy would be too cheap to meter. Reality, it seems, has a way of making things mundane.

The website CNET Money recently found this out when it began publishing articles described cautiously as “AI assisted,” and discovered that some of those articles contained clearly counterfactual statements. Self-driving cars are already establishing a chequered history, although possibly only because people are deliberately defeating the safety interlocks. Science fiction has often shown us what the worst possible case of AI could be. In an ideal world, we’ll spend the next few decades discovering what the best case might be. Perhaps the most realistic expectation is that we’ll get the usual complicated mix of both.

You might also like...

Next-Gen 5G Contribution: Part 2 - MEC & The Disruptive Potential Of 5G

The migration of the core network functionality of 5G to virtualized or cloud-native infrastructure opens up new capabilities like MEC which have the potential to disrupt current approaches to remote production contribution networks.

Next-Gen 5G Contribution: Part 1 - The Technology Of 5G

5G is a collection of standards that encompass a wide array of different use cases, across the entire spectrum of consumer and commercial users. Here we discuss the aspects of it that apply to live video contribution in broadcast production.

Chris Brown Discusses The Themes Of The 2024 NAB Show

The Broadcast Bridge sat down with Chris Brown, executive vice president and managing director, NAB Global Connections and Events to discuss this year’s gathering April 13-17 (show floor open April 14-17) and how the industry looks to the show e…

Essential Guide: Next-Gen 5G Contribution

This Essential Guide explores the technology of 5G and its ongoing roll out. It discusses the technical reasons why 5G has become the new standard in roaming contribution, and explores the potential disruptive impact 5G and MEC could have on…

Most cricket matches are produced in both English, for the world feed, and Hindi, for local markets, adding complexity to the production.

Project Managing The Creative Elements Of Live Sports Production

Huw Bevan is an Executive Producer, Consultant and Head of Cricket for Sunset+Vine, in London, one of the UK’s leading independent sports production companies that produces a full slate of rugby, soccer and cricket events each year. This…