Generative AI is having a moment. The last few months have seen rising use of image-generating apps like Lensa and Vana. Not only machine learning experts but regular people are using them to create AI-generated avatars and profile pictures. VCs are pouring money into image-generating startups like Stability AI, which recently launched a popular tool called Stable Diffusion. Content generator Jasper announced a large funding round of $125 million.
What is Generative AI?
The novel image-generating systems are based on diffusion models, which learn to create images from text prompts (e.g., “a sketch of a dolphin surfing a wave on the moon”) as they work their way through massive training data sets. The models — trained to “re-create” images as opposed to drawing them from scratch — start with pure noise and refine an image over time to make it incrementally closer to the text prompt.
These tools are part of the new wave of generative AI, which refers to using unsupervised learning algorithms to learn from existing text, audio or images and create new content.
DeviantArt recently released a Stable Diffusion–powered app for creating custom artwork. Microsoft is using DALL-E 2 to power a generative art feature coming to Microsoft Edge.
Are there legal ramifications?
What are the legal ramifications of using diffusion models to create AI-generated images? A central legal question that may come up as these tools enter wide commercial use is whether use of these tools constitutes illegal copyright infringement or is protected as fair use. The training data that goes into generative AI tools is made up of billions of images created by artists globally. Is this data protected by copyright law?
A research study by scientists at the University of Maryland and New York University identified cases where image-generating models, including Stable Diffusion, “copy” from the public internet data — including copyrighted images — on which they were trained.
The images generated by Stable Diffusion may be copied from their training data, either wholesale or by copying only parts of training images. It’s nearly impossible to verify that any particular image generated by Stable Diffusion is novel and not stolen from the training set.
The companies that run systems like Stable Diffusion claim that fair use — the doctrine in US law that permits the use of copyrighted material without first having to obtain permission from the rightsholder — protects them in the event that their models were trained on licensed content. However, it’s not clear whether courts will agree.
A machine learning expert in the legal profession, Bradford Newman, who leads the machine learning and AI practice of global law firm Baker McKenzie, said “Legally, right now, there is little guidance. There are the inevitable class actions, but the net-net of it all is when you’re using the massive data sets that these AI applications are and you sprinkle on top of that open-source licenses, the arguments are going to be fair use versus infringement.”
Generative AI experts are closely following the GitHub Copilot case. Matthew Butterick, a GitHub user, has filed a class action lawsuit claiming GitHub Copilot, a generative AI tool that suggests computer code to developers, used his source code as training data.
This may be the first case dealing specifically with machine learning and fair use in the US. Its outcome could have far-reaching implications on the use of diffusion models and generative AI tools broadly.
Fair Use considerations
Legal experts say fair use may depend on the specific use case. For example, AI-generated images of “Paris” may be allowed as fair use, while images that are based on input from living artists may be prohibited. The idea is that the former is in the public domain, while the latter is potentially depriving an artist of income from their copyrighted work.
Other countries have already passed legislation that says training machine learning is legal. The UK has had a text and data mining exception to copyright for research purposes since 2014. The EU has passed the Digital Single Market Directive in 2019 which contains an exception for text and data mining “for all purposes as long as the author has not reserved their right.”
Ultimately, the US may need new IP laws to cover AI, as novel use cases extend beyond those covered under existing copyright law. Sidespin Group’s machine learning experts can help sort through the technology aspects of generative AI and other machine learning technologies.