Nimitz Tech
Posts
Nimitz Tech Hearing 7-16-25 Too Big to Prosecute?: Examining the AI Industry’s Mass Ingestion of Copyrighted Works for AI Training

Nimitz Tech Hearing 7-16-25 Too Big to Prosecute?: Examining the AI Industry’s Mass Ingestion of Copyrighted Works for AI Training

July 17, 2025

⚡NIMITZ TECH NEWS FLASH⚡

“Too Big to Prosecute?: Examining the AI Industry’s Mass Ingestion of Copyrighted Works for AI Training”

Senate Judiciary, Subcommittee on Crime and Counterterrorism

July 17, 2025 (recording linked here)

HEARING INFORMATION

Witnesses and Written Testimony (Linked):

Mr. Maxwell Pritt: Partner, Boies Schiller Flexner LLP
Prof. Michael Smith: Professor of Information Technology and Marketing, Carnegie Mellon University
Mr. David Baldacci: Author
Dr. Bhamati Viswanathan: Professor of Law, New England Law School
Prof. Edward Lee: Professor of Law, Santa Clara University School of Law

HEARING HIGHLIGHTS

Mass Piracy of Copyrighted Works by AI Companies

The hearing centered on evidence that leading AI companies, including Meta, OpenAI, and Anthropic, trained their models using copyrighted materials obtained from pirate websites and shadow libraries. Witnesses described the scale of the piracy as unprecedented, involving hundreds of terabytes of books, articles, and creative works, including those authored by U.S. presidents and members of the subcommittee. This conduct was reportedly not incidental but deliberate, with internal communications showing employees and executives aware of the illegality and ethical concerns. The use of torrenting technologies to both acquire and distribute the content added further legal complexity to the issue.

Economic Impact on Authors and Creative Industries

Multiple witnesses described how unlicensed AI training harms authors financially and undermines the broader creative economy. Best-selling author David Baldacci shared how his books were used without permission, equating it to intellectual theft that damages both established and emerging writers. Professor Smith emphasized that piracy depresses revenue for creators, reduces incentives to produce new works, and weakens the licensing market. The testimony suggested that widespread infringement could erode the infrastructure that supports artistic careers, ultimately diminishing the cultural and economic contributions of creative industries.

Technology Companies’ Attempts to Conceal Infringement

Internal communications and testimony revealed that certain AI companies took active steps to conceal their use of pirated material. Meta, in particular, was cited as having intentionally avoided using its own infrastructure for torrenting in order to evade detection, opting instead to use third-party servers like AWS. Engineers expressed concern in internal chats about the legality and ethics of their actions, with some explicitly acknowledging that torrenting from corporate devices felt wrong. These actions raised concerns not only about the infringement itself but about the companies’ awareness and efforts to obscure their conduct.

IN THEIR WORDS

“Let me just start by saying that today's hearing is about the largest intellectual property theft in American history... AI companies are training their models on stolen material. Period.”

- Chair Hawley

“I truly felt like someone had backed up a truck to my imagination and stolen everything I'd ever created.”

- Mr. Baldacci

SUMMARY OF OPENING STATEMENTS FROM THE SUBCOMMITTEE

Chair Hawley opened the hearing by stating that it would examine what he described as the largest intellectual property theft in American history, committed by AI companies training their models on pirated content. He asserted that these companies, including Meta and Anthropic, deliberately stole copyrighted materials—such as books and academic articles—from illegal online repositories. He claimed internal warnings were ignored and that Meta leadership, including Mark Zuckerberg, approved the use of pirated material while attempting to hide their actions. Hawley emphasized that such conduct was criminal, not innovative, and called for the enforcement of existing laws to hold big tech accountable.
Ranking Member Durbin stressed the importance of balancing AI innovation with the protection of intellectual property, noting that America’s creative industries contribute over a trillion dollars to the economy. He shared a personal anecdote about his first encounter with copyright law as a restaurant owner and used it to illustrate the seriousness of unauthorized use of protected material. Durbin questioned whether AI companies should be allowed to use copyrighted content under “fair use” or be required to compensate creators. He expressed concern about reports of companies like Meta and Anthropic using pirated materials for AI training and highlighted the need to incentivize creativity while regulating tech industry practices.

SUMMARY OF WITNESS STATEMENT

Mr. Pritt testified that companies like Meta, OpenAI, and Anthropic engaged in mass piracy of copyrighted books and publications from illicit online marketplaces to train their AI models. He stated that Meta alone pirated over 200 terabytes of content, including works by members of the subcommittee and every U.S. president and vice president of the 21st century. Pritt asserted that this piracy was willful and facilitated by senior executives, despite internal warnings about the illegality and ethical implications. He criticized AI companies for invoking fair use as a justification and argued that their refusal to compensate creators violated the Copyright Act and undermined the creative economy.
Prof. Smith explained that his research over 25 years has shown that digital piracy harms creators and society by reducing economic incentives and legal sales. He compared current arguments in favor of AI’s use of pirated content to those used during the early internet era, which he said have been empirically disproven. Smith emphasized that using pirated content to train generative AI models will damage licensing markets, harm emerging creators, and flood the market with machine-generated content that displaces human artists. He concluded that enforcing copyright law against AI-related piracy could create an ecosystem where both technology and creative industries can thrive.
Dr. Viswanathan described AI companies’ use of pirated content as “a crime compounding a crime,” arguing that these companies source stolen material from already illegal repositories. She warned that such practices not only harm individual creators but also erode the foundational incentive structure of copyright law, which is enshrined in the U.S. Constitution. Viswanathan argued that licensing solutions already exist and could offer a fair and legal path forward for training AI without destroying livelihoods. She urged Congress to enforce existing copyright protections and ensure technological progress does not come at the cost of creative labor and cultural heritage.
Mr. Baldacci recounted how he discovered that AI systems had used his novels without permission, likening it to a theft of his imagination and life’s work. He rejected the notion that AI ingestion is comparable to aspiring writers learning from authors, emphasizing that AI replicates and regurgitates his actual content. Baldacci criticized trillion-dollar companies for claiming they could not afford to license works, calling their actions efficient theft that harms both established and emerging authors. He warned that AI-generated books would saturate the market with low-cost content, undermine the publishing ecosystem, and render copyright protections meaningless unless laws are enforced equally for creators and corporations.
Prof. Lee argued that using copyrighted works to train AI models can qualify as fair use when serving a transformative purpose, as recent court decisions have affirmed. He noted that the practice of training AI on large datasets originated in academic research and has significantly advanced technological innovation. Lee acknowledged that fair use must be assessed case-by-case and that infringing outputs would change the analysis, but current rulings found no such outputs in lawsuits against Meta and Anthropic. He urged caution from Congress and the courts, citing geopolitical and economic stakes in U.S. AI leadership, and recommended allowing legal processes to further develop before taking legislative action.

SUMMARY OF KEY Q&A

Chair Hawley asked why AI companies use copyrighted books instead of dictionaries. Dr. Viswanathan explained that full works are needed to teach language structure and syntax. Chair Hawley asked where the books come from, and Dr. Viswanathan said companies take them from pirate sites without compensation. When asked how, she explained they use torrenting to both download and distribute the content. Chair Hawley asked if torrenting was legal, and Dr. Viswanathan said it isn’t in this context. She added that sites like Anna’s Archive benefit from this behavior and that criminal enforcement is limited. When asked about criminal standards, she said Meta’s actions met both willfulness and commercial intent thresholds.
Chair Hawley asked whether Meta used torrents to obtain data and how much was taken. Mr. Pritt said over 200 terabytes were pirated. Chair Hawley asked how much Meta paid; Mr. Pritt said nothing. When asked if Meta explored licensing, Mr. Pritt said it was briefly considered and deemed too costly. Chair Hawley asked if employees knew it was wrong, and Mr. Pritt confirmed, citing internal messages showing legal and ethical concerns.
Sen. Durbin asked Mr. Baldacci to describe his writing process. Mr. Baldacci said it was a long, personal journey shaped by experience and discipline, which AI cannot replicate. Sen. Durbin asked if he worries about plagiarism, and Mr. Baldacci said no, because his stories come from his unique perspective, unlike AI, which reuses existing content.
Sen. Durbin asked Mr. Lee if his fair use stance meant AI firms should avoid liability. Prof. Lee said no, and emphasized fair use should be judged case by case under current Supreme Court precedents. When Sen. Durbin said this shifts the burden to authors, Prof. Lee responded that the initial burden is on defendants, and courts have seen no infringing outputs yet. Sen. Durbin asked if AI’s economic benefit comes at creators’ expense; Prof. Lee said it benefits both companies and national interests. When Sen. Durbin asked if Mr. Baldacci is paying that price, Prof. Lee said if infringement is proven, fair use fails and authors have legal recourse.
Chair Hawley asked if Baldacci, as a citizen, is part of the national interest; Prof. Lee agreed he is. Chair Hawley challenged Mr. Lee's claim that AI development benefits the United States by arguing that mass copyright theft impoverishes American citizens. Prof.. Lee responded that national AI leadership is a stated U.S. policy goal and cited President Trump’s executive order and AI advisor David Sacks to support the fair use pathway. Chair Hawley pushed back, criticizing reliance on an unelected "AI czar" and questioned whether enriching corporations at citizens’ expense was truly a national benefit. Prof. Lee clarified that courts, not policymakers or executives, should decide these questions and emphasized that 44 lawsuits were currently pending.
Chair Hawley questioned the fairness of allowing companies to invoke fair use after knowingly pirating works. Prof. Lee noted that courts had reached different conclusions and that the Supreme Court had not required good faith as a prerequisite for fair use.
Sen. Welch expressed support for protections for emerging artists and asked Mr. Baldacci whether he suspected AI companies had used his books. Mr. Baldacci confirmed that he had seen evidence that at least 44 of his novels were ingested by AI systems and was part of a class-action lawsuit.
Sen. Welch asked Prof. Smith to explain how unrestricted AI training on copyrighted music could harm creators. Prof. Smith warned that it would reduce artist income, discourage creation, and give AI firms leverage to pressure creators into exploitative licenses.
Sen. Durbin asked whether Meta paid the plaintiffs for the use of their copyrighted works. Mr. Pritt said Meta paid nothing to authors but did spend on infrastructure to acquire and host pirated data. Sen. Durbin asked how this behavior related to establishing willfulness for criminal copyright infringement. Mr. Pritt said the evidence showed Meta’s actions were knowing and intentional. Sen. Durbin invited comment from the other panelists on whether the conduct met the willfulness standard. Mr. Lee stated that Judge Chhabria reviewed internal communications and concluded that, due to the unsettled nature of the legal question, criminal liability did not apply. Dr. Viswanathan argued that the scale, intent, and nature of the infringement were not consistent with traditional fair use, and that invoking fair use in this context was disingenuous.
Chair Hawley asked whether Meta tried to hide its piracy. Mr. Pritt said Meta used Amazon Web Services rather than internal servers to avoid detection. Chair Hawley presented internal Meta chats showing engineers joking about piracy and avoiding company infrastructure. Mr. Pritt agreed that the documents reflected knowledge of wrongdoing and a lack of good faith.
Chair Hawley displayed more evidence of Meta employees expressing concerns about tracing pirated material back to Facebook servers. Prof. Lee acknowledged that certain distribution claims were still pending and that some conduct might not qualify as fair use. Chair Hawley concluded that if this conduct was not illegal, Congress must act to change the law.
Chair Hawley closed by arguing that the unauthorized use of creative works by tech giants was not just a legal issue but a moral one that violated the rights of citizens. He warned that allowing corporations to exploit content without compensation undermined American values and called on Congress to take action.

ADD TO THE NIMITZ NETWORK

Know someone else who would enjoy our updates? Feel free to forward them this email and have them subscribe here.

Update your email preferences or unsubscribe here

415 New Jersey Ave SE, Unit 3
Washington, DC 20003, United States of America