A new lawsuit against GitHub highlights the challenges faced by developer platforms as AI-generated code becomes more accessible. What happens when machine learning algorithms are challenged by copyright law? What do machine learning experts think of such actions?
The class-action lawsuit filed in a US federal court challenges the legality of GitHub Copilot and the related OpenAI Codex. GitHub Copilot is an OpenAI-powered coding assistant that was rolled out to the public in July. The lawsuit against GitHub, Microsoft, and OpenAI claims violation of open-source licenses and could have a wide impact in the world of artificial intelligence.
What is Copilot?
The service is a cloud-based tool to assist developers in writing new code by analyzing existing code and comments on GitHub. It is similar to an autocomplete tool – like the one that finishes sentences in Gmail — but for source code. The Copilot editor suggests code for accomplishing programming tasks significantly faster.
The product has gained widespread adoption. It’s easy and convenient. Let’s say you need a sorting algorithm and don’t want to write it. You can create a function called Sort, and Copilot will suggest a sorting algorithm for you so you don’t have to type anything. The sourcing algorithm comes from combing through everyone’s code submitted on GitHub to figure out what source code is right for you.
Is Copilot Stealing?
The question raised by the lawsuit is whether GitHub Copilot is stealing from other people’s work, in violation of federal copyright law, or is it “fair use”?
Fair use is a legal doctrine that promotes freedom of expression by permitting the unlicensed use of copyright-protected works in certain circumstances. They include types of uses such as criticism, comment, news reporting, teaching, scholarship, and research.
The lawsuit filed against GitHub claims OpenAI Codex, the engine powering GitHub’s Copilot, is an AI product that relies on unprecedented open-source software piracy.
The plaintiffs argue that Copilot violates the rights of developers who have posted code under different open-source licenses that require attribution, including the MIT, GPL, and Apache licenses. The service has been trained with machine learning using billions of lines of code written by human programmers.
Lead plaintiff Matthew Butterick, a programmer and lawyer, argues that “it does not make sense to hold AI systems to a different standard than we would hold human users. Widespread open-source license violations should not be shrugged off as an unavoidable cost.”
The Bigger Picture
What’s at stake in this lawsuit? The outcome could shape the future of copyright for AI-generated code and affect how developers use open-source tools to write code. It could have far-reaching impacts on new ML-powered coding assistants like Amazon CodeWhisperer.
Some experts question Github’s claim of Fair Use, saying that even if it was applicable here, it wouldn’t help circumvent (a) the breach of contract, (b) the privacy issues, and (c) the DMCA.
Others argue that the lawsuit is misplaced, saying Copilot is just a tool, and the focus should be on the developers who use copyrighted code incorrectly, regardless of why and how they did it.
More broadly, the case may render judgment on the legality of treating the entire internet as fair use in order to train on other people’s work while privatizing the profits.
Other developer platforms are facing questions about the use of AI-generated coding assistants. Stack Overflow recently announced a ban on the use of ChatGPT-generated text for content on its platform. They say ChatGPT has been trained on lots of high-quality human-curated Stack Overflow data.
Machine Learning Experts Can Help
Our machine learning experts can help in sorting through the technical aspects of legal challenges involving technology, in the areas of copyright infringement, trade secret misappropriation, patent infringement, or other matters. At Sidespin Group, we provide a wide range of technical expertise and are happy to schedule a free consultation.