Plato’s allegory of the cave has taken on renewed relevance in the age of generative AI.

We are surrounded by shadows—artworks, stories, songs, and voices that resemble the real but may not have originated with any person. These outputs are often uncannily convincing. But the deeper question isn’t how realistic they seem. It’s what lies behind them—and what rights, if any, remain with the creators whose works helped train the machines.

As courts and policymakers begin to engage with the complex questions surrounding AI-generated content, the core issue is becoming clearer: creators are being systematically written out of the systems built on their labor. The legal framework, while evolving, is still far behind the technology.

It’s time to draw the line. The challenge is figuring out where.

Legal Flashpoints: Authors, Actors, and the Shadow Economy of Training Data

One of the clearest signs of legal momentum came in the recent Anthropic lawsuit. A class of authors alleged that their copyrighted books had been used—without permission—to train Anthropic’s Claude AI. A federal district court allowed the case to move forward, finding that the company had saved pirated copies of as many as seven million works. Though the court granted partial summary judgment in favor of the company on certain fair use claims, it declined to dismiss the core allegations and ordered a trial on damages.

That trial was scheduled for December 2025—until Anthropic abruptly settled, signing a binding term sheet with plaintiffs. The company, notably, had just closed a funding round valuing it at $183 billion.

That juxtaposition is not lost on observers: a company accused of large-scale copyright infringement becoming one of the most highly valued AI firms in the world, while authors learned of their eligibility for the class action via email.

Other disputes are unfolding along similar lines. In Lehrman v. LOVO, voice actors sued over alleged misappropriation of their voices in AI-generated content. There, a federal judge allowed state law claims—including right of publicity and unfair competition—to proceed even after dismissing federal copyright claims. The ruling underscores how generative AI raises harms that don’t map neatly onto copyright law.

And the lawsuits keep coming. Major record labels have accused Anthropic of training on copyrighted lyrics. Reddit has alleged unauthorized scraping of user content. OpenAI and Meta face lawsuits from authors and media companies. Each case addresses a different facet of the same fundamental problem: the systematic use of protected works—whether books, code, music, or voices—to fuel AI systems that increasingly shape the digital landscape.

The Law Is Fragmented and Unclear

These cases are making their way through courts, but they are not resolving the underlying uncertainty.

The U.S. Copyright Office, in a recent report, acknowledged the ambiguity surrounding how copyright law applies to AI training and output. The Office confirmed that using copyrighted works to train AI systems “may implicate the reproduction right” but declined to offer firm guidance on where the line is drawn. It did, however, reiterate that outputs “that are substantially similar to existing works” could infringe.

In Europe, the legal landscape is somewhat more structured. The European Union’s AI Act includes transparency obligations and recognizes a right to opt out of data mining for training purposes. But even there, enforcement mechanisms remain limited, and rights holders are left with few tools to meaningfully monitor or control how their works are used.

Complicating matters is the patchwork nature of state laws in the U.S. While copyright law is federal, state laws govern misappropriation of voice, likeness, and identity. This creates a fragmented framework where the legal outcome may depend on whether the harm sounds more like “copying” or more like “identity theft.”

Philosophical and Economic Tensions

At the heart of these disputes is a philosophical divide: is the use of copyrighted content to train AI systems transformative and innovative, or is it exploitative and parasitic?

Proponents of broad fair use argue that training an AI model is akin to a researcher reading books or listening to music in order to produce new ideas. The model doesn’t memorize or reproduce exact works (they claim), but instead learns patterns and structures in the aggregate. From this perspective, requiring licenses for all training data would stifle development and entrench the incumbents who can afford to pay for massive datasets.

Critics—particularly creators—view this as a dangerous oversimplification. AI systems are not merely “reading” in the human sense. They ingest and encode millions of works, sometimes replicating style, structure, or specific language. In some cases, outputs have been shown to reproduce copyrighted content nearly verbatim. Even when outputs are technically “original,” they often compete with the works that trained them.

The economic incentives are also clear. Training on copyrighted material is faster, cheaper, and more effective than building models from licensed or public domain content. The companies doing the training, however, frequently disclaim responsibility for how the content was obtained or used.

Toward a Coherent Framework

What is needed now is not more litigation alone, but a coherent framework that acknowledges both technological realities and the rights of creators, like the clients we advise at AEON Law.

Such a framework would likely include:

Disclosure of Training Data: AI developers should be required to identify, at least in general terms, the nature and source of training datasets. Transparency is a prerequisite for accountability.
Collective Licensing Options: Just as musicians and authors benefit from performance rights organizations and licensing collectives, similar structures could facilitate licensing for training datasets.
Clear Rules for Output Liability: Courts and lawmakers will need to define when AI-generated outputs cross the line into infringement. This may involve new standards that account for imitation, not just duplication.
Protection Beyond Copyright: Voice, likeness, and persona-based harms must be addressed by updating or enforcing existing state laws and creating consistent federal standards.
Differentiation of Use Cases: The law should distinguish between training for research, for commercial deployment, and for derivative exploitation. Not all uses are created equal.

Drawing the Line

The question is no longer whether generative AI will disrupt intellectual property law. That disruption is well underway. The real question is how society will adapt—and whose interests the new rules will serve.

Without clear boundaries, creators will lose control of their work, audiences will be flooded with synthetic content of uncertain origin, and innovation itself may become a game of who can train faster and ask forgiveness later.

Drawing the line is not about protecting legacy industries or stifling innovation. It’s about recognizing that creativity has value—even when it’s inconvenient to the bottom line of those building machines to replace it.

As Plato taught, shadows are not the truth. But they still come from something real.
______

About the Author

Adam Philipp is the founder of AEON Law, an intellectual property law firm in Seattle. Recognized by Chambers USA, IAM Patent 1000, and IP Stars, Adam helps clients in high-tech industries protect their creations and profit from them.

Author

AIJ Thought Leader

View all posts

AIJ Thought Leader 4 weeks ago

5 minutes read

Generative AI and Intellectual Property: Where Do We Draw the Line?

By Adam Philipp, founder of AEON Law

Legal Flashpoints: Authors, Actors, and the Shadow Economy of Training Data

The Law Is Fragmented and Unclear

Philosophical and Economic Tensions

Toward a Coherent Framework

Drawing the Line

About the Author

Author

Legal Flashpoints: Authors, Actors, and the Shadow Economy of Training Data

The Law Is Fragmented and Unclear

Philosophical and Economic Tensions

Toward a Coherent Framework

Drawing the Line

About the Author

Author

Related Articles

IMA Ibérica and Sabio break records with the largest deployment of Google Agent Assist in Europe

How AI Is Changing Passive Real Estate Investing: From Gut Feel to Data-Driven Decisions

Why Hospitality Is Lagging Behind in AI Adoption — And What Needs to Change

The UK’s vital professional services industry must adapt or face irrelevancy