DataEthics

Why copyright law is failing to protect content right holders against generative AI

In America, content published online is protected by the Digital Millenium Copyright Act (DMCA). This federal law was passed in 1998 in response to rising concern over how the internet was enabling content theft via the unauthorised copying and distribution of published work on the web.

Until recently, the DMCA has done a fairly effective job of protecting the exclusive right of copyright holders to distribute their content online, while still allowing sufficient ease of access to published content to encourage innovation and free expression.

However, the last year has seen rising dissatisfaction with current copyright protection after creators have become aware of their work being used to train generative AI models such as ChatGPT and DALL-E. This has led to a stream of copyright infringement lawsuits against AI developers, which is revealing the limitations of legal protection that the DMCA is able to provide for content creators in the emerging landscape of AI’s developing generative abilities.

Limitations of the DMCA

There are two key limitations of the DMCA that have come to light in the lawsuits against AI companies:

  1. The DMCA only provides protection in cases where it can be proven that content has been explicitly copied, which is often not the case with AI.
  2. The fair use clause of the DMCA allows copyright protected content to be repurposed for certain causes which bring benefit to society. This is providing AI companies with a loophole to bypass current copyright protection because the applications of AI are so broad that its benefits to society can be easily demonstrated.

The first limitation is problematic for content creators because it means that unless they can prove that their work has been regurgitated word for word by an AI application, or with a high degree of image similarity (for artworks), the DMCA is unable to protect them.

Some of the plaintiffs in copyright lawsuits have been able to claim that their work has been reproduced verbatim by an AI model, such as the New York Times, for example. However, many content creators will be unable to prove to a sufficient legal standard that their work has been copied, given that generative AI models work by ingesting huge amounts of content and then generating responses by abstracting information from them.

Nevertheless, the content generated by an AI model is often similar enough to the original source, or contains specific enough information, that the creator can identify a model’s use of their work.

The shortcoming of the DMCA to regulate the use of content by machines is not surprising given that it was originally passed as a measure to discourage humans from copying each other’s work. When it was just humans accessing each other’s work, strict proof of explicit copying was needed to ensure that there was not too much restriction on the freedom of human thought and the sharing of information.

Now, given that the human mind is no longer the only form of intellect that can generate unique responses via ‘learning’ from external sources, updated legislation may be needed to better account for the development of intellectual property as not just a unique product of human creativity, but as an increasingly commercial product that can be bought and sold, replicated, and produced on a mass scale.

The fair use clause

The second limitation of the DMCA is that its fair use clause has given AI developers a legal loophole to justify the unlimited and unauthorised use of copyright protected content for generative AI training. This clause provides exemption from the content right holder’s exclusive right to distribute their work according to how well the repurposing of the content meets the following criteria:  

  1. Whether the repurposing of the content is for non-profit and educational use.
  2. If the copyrighted content is of a more factual or imaginative nature.
  3. How much of the content is used, and the significance of this portion within the original work.
  4. The impact of the original content’s unauthorized use on its potential market value.

Ian Crosby, US attorney at Susman Godfrey LLP, defines the fair use clause as a way of ‘balancing the value of the new technology to society in general with the cost to the people who are being used to create it’, as part of his discussion on generative AI in a webinar from the Digital Media Licensing Agency.

Now acting as lead counsel for the New York Times in their lawsuit against OpenAI and Microsoft, Crosby has asserted that the tech companies’ use of the news outlet’s content does not constitute ‘fair use by any measure’, in a comment sent to The Verge.

However, this opinion may be in the minority, at least within the tech community, with a poll by the AI Journal finding that 32% of respondents supported OpenAI vs only 19% who supported the New York Times in this lawsuit. This is likely due to widespread recognition of AI models’ reliance on vast amounts of data and the vast potential that AI has to transform industries, solve global crises, and assist humans in their day-to-day lives.

In copyright lawsuits, the plaintiffs are claiming that there is significant cost to the content creators whose work is being used to fuel generative AI, with the New York Times lawsuit demanding reparations of ‘billions of dollars in statutory and actual damages’.

However, concrete proof of the financial loss caused by AI’s use of content is hard to come by, and is typically just speculative. For this reason, the fair use clause of the DMCA is likely to work in the favour of AI developers, who can further strengthen their defence by illustrating several non-for-profit and educational uses of generative AI.

This has recently played out in a California court’s partial dismissal of the lawsuit filed by Sarah Silverman and other authors against OpenAI. Out of six allegations they made, including unfair competition and unjust enrichment, the judge threw out all claims except direct copyright infringement, as per OpenAI’s request. The case will therefore rest on whether or not will depend on whether Silverman et al. can prove that any of ChatGPT’s outputs have reproduced their writing verbatim.

Conclusion

The DMCA does not account for the wider implications of AI’s use of human produced content, and how this could threaten the stability of creative industries in the future. A review of the DMCA by the Senate Subcommittee on Intellectual Property back in 2020 highlighted the need to ensure that the DMCA’s prerogative was not to serve companies as much as individual users such as artists and writers. With the fair use clause providing tech companies with a legal loophole for their use of writers and artists’ content, this priority seems to have been lost amidst the hype of generative AI and the perceived importance of its technology to the future of the humanity.

Overall, the shortcoming of the DMCA in addressing the bigger picture that is at stake in copyright lawsuits is putting pressure on policymakers to evaluate the potential need for revised legislation in clarifying the future relationship between generative AI and its sources, and ensuring that the needs of content creators do not go ignored.

Author

  • I write about developments in technology and AI, with a focus on its impact on society, and our perception of ourselves and the world around us. I am particularly interested in how AI is transforming the healthcare, environmental, and education sectors. My background is in Linguistics and Classical literature, which has equipped me with skills in critical analysis, research and writing, and in-depth knowledge of language development and linguistic structures. Alongside writing about AI, my passions include history, philosophy, modern art, music, and creative writing.

    View all posts

Related Articles

Back to top button