AI Decipher File · 19 September 2023 (filing) through ongoing consolidated litigation in 2025-2026
Authors Guild v. OpenAI September 2023: When the Major Book-Author Class Action Joined the Generative-AI Training-Data Copyright Docket
On 19 September 2023 the Authors Guild and seventeen named author plaintiffs (including Jonathan Franzen, John Grisham, George R.R. Martin, Jodi Picoult, and Scott Turow) filed a class-action lawsuit against OpenAI in the United States District Court for the Southern District of New York. The complaint alleged that OpenAI's training of GPT models on the plaintiffs' copyrighted books, sourced from datasets including Books3, constituted direct and contributory copyright infringement. The case was consolidated with related actions and joined the broader docket of training-data copyright litigation that includes the parallel NYT v. OpenAI case (filed December 2023) and the Sarah Silverman v. OpenAI case (filed July 2023).
Failure pattern
Foundation-model training on copyrighted text without licensing or documented fair-use defense for major book corpora
Organizations involved
Authors Guild (lead plaintiff organization), Seventeen named author plaintiffs (Franzen, Grisham, Martin, Picoult, Turow, and others), OpenAI, United States District Court for the Southern District of New York
Incident summary
On 19 September 2023 the Authors Guild and seventeen named author plaintiffs filed a class-action lawsuit against OpenAI in the United States District Court for the Southern District of New York (Case No. 1:23-cv-08292). The named plaintiffs included Jonathan Franzen, John Grisham, George R.R. Martin, Jodi Picoult, Scott Turow, and David Baldacci. The complaint alleged that OpenAI trained GPT models on the plaintiffs' copyrighted books, with the training data including book corpora like Books3 that contained substantial swaths of contemporary fiction without license.
The Authors Guild filing joined a growing docket of generative-AI training-data copyright litigation. The Sarah Silverman, Christopher Golden, and Richard Kadrey class action against OpenAI had been filed 7 July 2023 in the Northern District of California (Case No. 3:23-cv-03416). The New York Times v. OpenAI suit followed on 27 December 2023.
Through 2024 and 2025 the cases proceeded through motions to dismiss, narrowing of claims, and discovery. The Authors Guild case is one of several actions that have been consolidated in some venues. The legal questions (whether training constitutes fair use, whether memorization and reproduction of training material is infringement, whether RAG-style retrieval is distinct from training) are foundational and unresolved at the time of writing.
Failure technique
The legal-technical pattern is unauthorized reuse of copyrighted text in foundation-model training where the trained model can reproduce substantial passages from the training material. The Books3 dataset (compiled from a controversial pirated-books collection) was a primary evidence target: plaintiffs alleged Books3 was used in OpenAI training and that GPT models could be prompted to reproduce passages from books in Books3.
Per the US Copyright Office July 2024 report on Generative AI Training, the central open questions are whether training is a transformative fair use, whether memorization changes the analysis, and whether commercial vs. non-commercial training context matters. The Authors Guild and parallel cases are foundational litigation that will establish US copyright doctrine for AI training.
OpenAI's defense has consistently asserted fair use, transformative purpose, and that the trained model does not reproduce specific copyrighted material at scale. Discovery and motion practice in the consolidated cases have tested both factual and legal aspects of these defenses. The litigation is being watched as the foundational text-training-data copyright case in parallel with Getty v. Stability for image-training data.
Impact and consequences
Direct commercial impact on OpenAI during 2023-2025 has been substantial in legal cost and operational distraction, though the company has continued to grow during the litigation period. OpenAI has executed direct content licensing deals with major publishers (Associated Press, Axel Springer, News Corp, Time, Vox Media, others) during the litigation period; these deals are partially a market response to the litigation risk.
Industry impact: the Authors Guild case is one of the foundational text-training-data copyright cases. Subsequent foundation-model launches (Anthropic Claude 3, Google Gemini, Meta Llama 3, others) have all addressed training-data sourcing more explicitly than 2022-era products did. Direct content licensing of major news and publisher corpora has become a competitive surface among foundation labs.
Legal-precedent impact: trial outcomes in the Authors Guild and parallel cases will be foundational for US AI copyright doctrine. The cases sit alongside the NYT v. OpenAI case (specific facts about article reproduction) and the Sarah Silverman case (book reproduction) to test multiple aspects of the training-data IP question.
Lessons for builders
Treat training-data IP review as a launch-readiness gate for commercial foundation-model products serving production workloads. The post-2023 commercial landscape has shifted toward direct content licensing (OpenAI publisher deals), explicit indemnification (OpenAI Copyright Shield, Microsoft AI Customer Copyright Commitment), or open-licensing-only training data. AI Strategy Lead and Senior AI Product Manager own this gate.
Watch for memorization on the model-output side. The cases consistently use evidence of model reproduction of training material as the bridge from corpus claim to model claim. Pre-deployment evaluation should test for memorization of identifiable training-corpus signatures (specific sentences, character names, distinctive passages).
Track the US Copyright Office and EU AI Act developments. The US Copyright Office's July 2024 Part 3 report on Generative AI Training is the most-cited US-policy artifact; the EU AI Act Article 53 training-data documentation obligations apply to general-purpose AI models. Both shape the legal-defense surface.
Document corpus-assembly decisions and the licensing posture for each major corpus component. The Books3 evidence in the Authors Guild case is concrete; equally specific documentation on the licensing side is the operational defense.
Mitigations
What builders should put in place to address the failure pattern. Each mitigation maps to operational practice the relevant Applied AI roles own.
- ›Treat training-data IP review as a launch-readiness gate for commercial foundation-model products.
- ›Execute direct content licensing deals for major news and publisher corpora; OpenAI's post-litigation publisher deals are the market template.
- ›Run pre-deployment memorization evaluation specifically: prompt the model with passages from sensitive corpora and measure reproduction.
- ›Document corpus-assembly decisions and licensing posture for each major corpus component as a maintained public artifact.
- ›Track US Copyright Office reports and EU AI Act Article 53 training-data documentation obligations.
- ›Offer customer-facing indemnification (OpenAI Copyright Shield, Microsoft AI Customer Copyright Commitment) as a commercial response to the training-data IP risk.
Related Applied AI roles
The Applied AI roles whose day-to-day work would have prevented, detected, or contained this incident.
- AI Strategy Lead: An AI Strategy Lead owns organizational AI strategy and prioritization at the company level.
- Senior AI Product Manager: A Senior AI Product Manager owns AI product strategy across multiple feature areas.
- AI Product Manager: An AI Product Manager owns AI-powered product features and the roadmap that ships them.
- AI Engineer: An AI Engineer builds production cybersecurity-relevant AI systems integrating LLMs, embeddings, and retrieval pipelines.
Companies central to this incident
Read the DecipherU Applied AI company profiles for the organizations whose decisions, products, or research shaped this incident.
- OpenAI: Frontier large language models and consumer + API AI products
Related AI Decipher Files
- New York Times v. OpenAI (Dec 2023): The Copyright Case That Defines AI Training Liability
- Stability AI v. Getty Images February 2023: When Image Generators Faced Their First Major Training-Data Copyright Lawsuit
- Adobe Firefly Training Data Controversy April 2024: When 'Ethically Trained' Claims Met Disclosed Use of AI-Generated Images
Frequently asked questions
What did the Authors Guild sue OpenAI for?
Per the complaint (Authors Guild et al. v. OpenAI Inc., Case No. 1:23-cv-08292, S.D.N.Y., 19 September 2023), the Authors Guild and seventeen named author plaintiffs alleged direct and contributory copyright infringement based on OpenAI's training of GPT models on the plaintiffs' copyrighted books, with the training data including corpora like Books3 that contained substantial contemporary fiction without license.
Who are the named plaintiffs?
The seventeen named author plaintiffs include Jonathan Franzen, John Grisham, George R.R. Martin, Jodi Picoult, Scott Turow, David Baldacci, and others. The Authors Guild served as lead plaintiff organization. The case is a class action seeking to represent all authors whose copyrighted books were used in OpenAI training.
How does this case relate to the Sarah Silverman and NYT cases?
Silverman, Golden, and Kadrey filed parallel class action in the Northern District of California on 7 July 2023 (Case No. 3:23-cv-03416). The New York Times v. OpenAI case was filed 27 December 2023 in the Southern District of New York. All three cases address training-data copyright questions from different angles (book reproduction, article reproduction, fair-use scope) and form the foundational US text-training-data litigation docket.
What does the Authors Guild case teach Applied AI strategy leads?
Treat training-data IP review as a launch-readiness gate. Watch for memorization on the model-output side; pre-deployment evaluation should test for reproduction of identifiable training-corpus signatures. Track US Copyright Office and EU AI Act developments. Document corpus-assembly decisions and licensing posture for each major corpus component.
Which Applied AI roles work on training-data IP litigation defense?
AI Strategy Lead owns the regulatory and legal-posture work and external-counsel coordination. Senior AI Product Manager and AI Product Manager own the product-level decisions about provenance positioning and licensing relationships. AI Engineer owns the corpus-assembly pipeline and memorization-detection evaluation infrastructure.
Sources
- Authors Guild et al. v. OpenAI Inc. et al., Case No. 1:23-cv-08292 (S.D.N.Y., filed 19 September 2023) — full docket via CourtListener
- Authors Guild, "Authors Guild, John Grisham, Jodi Picoult, David Baldacci, George R.R. Martin, and 13 Other Authors File Class-Action Suit Against OpenAI" (Authors Guild, 19 September 2023)
- Silverman v. OpenAI Inc., Case No. 3:23-cv-03416 (N.D. Cal., filed 7 July 2023) — parallel class action filed by Sarah Silverman, Christopher Golden, Richard Kadrey
- United States Copyright Office, "Copyright and Artificial Intelligence, Part 3: Generative AI Training" (Pre-Publication Report, 9 May 2025)
- Open Source Initiative + Creative Commons, joint statements on AI training data and copyright (2023-2024)
DecipherU is not affiliated with, endorsed by, or sponsored by any company listed in this directory. Information compiled from publicly available sources for educational purposes.
Where to go next
Three next steps depending on where you are. The first two are free.
Free · 2 minutes
Start with the AI Risk Score
Two minutes. Tells you how exposed your current role is to AI automation and which defensive moves carry the best return.
Start the AI Risk Score →Paid program · $147-$597
Aligned course: SOC Analyst Fundamentals
Capstone reviewed by the founder, published rubric, Ed25519-signed verifiable credential on completion.
View the course →Free account
Save your results and track progress
A free account stores your assessments, recommendations, and an exportable copy of your Career DNA. No card needed.
Create your account →Get cybersecurity career insights delivered weekly
Join cybersecurity professionals receiving weekly intelligence on threats, job market trends, salary data, and career growth strategies.
By subscribing you agree to our privacy policy. Unsubscribe anytime.