Learning Library

← Back to Library

Judge Rules AI Training Fair Use

Key Points

  • Judge William Alup’s ruling in *Barts v. Anthropic* affirms that using copyrighted books for AI training can qualify as fair use, but explicitly condemns training on material obtained from pirated sources.
  • The decision frames AI training as a “transformative” activity—machines read texts and generate new, original outputs—providing a legal foothold for future AI developers.
  • Alup’s nuanced language creates a “Solomon’s choice” scenario: while the act of training on millions of books may be permissible, the method of acquiring those books determines liability.
  • In response, Anthropic overhauled its data‑gathering strategy in 2024, hiring former Google book‑scanning chief Tom Turvy to legally purchase and scan physical books, even destroying the originals after digitization.
  • Because Anthropic’s new digital copies stem from legitimately purchased books, the court ruled those scans as fair‑use training data, setting a precedent that lawful acquisition can shield AI companies from copyright infringement claims.

Full Transcript

# Judge Rules AI Training Fair Use **Source:** [https://www.youtube.com/watch?v=8beAhtbnM4Y](https://www.youtube.com/watch?v=8beAhtbnM4Y) **Duration:** 00:06:04 ## Summary - Judge William Alup’s ruling in *Barts v. Anthropic* affirms that using copyrighted books for AI training can qualify as fair use, but explicitly condemns training on material obtained from pirated sources. - The decision frames AI training as a “transformative” activity—machines read texts and generate new, original outputs—providing a legal foothold for future AI developers. - Alup’s nuanced language creates a “Solomon’s choice” scenario: while the act of training on millions of books may be permissible, the method of acquiring those books determines liability. - In response, Anthropic overhauled its data‑gathering strategy in 2024, hiring former Google book‑scanning chief Tom Turvy to legally purchase and scan physical books, even destroying the originals after digitization. - Because Anthropic’s new digital copies stem from legitimately purchased books, the court ruled those scans as fair‑use training data, setting a precedent that lawful acquisition can shield AI companies from copyright infringement claims. ## Sections - [00:00:00](https://www.youtube.com/watch?v=8beAhtbnM4Y&t=0s) **AI Training Fair Use Ruling** - Judge William Alup ruled that using copyrighted books to train AI can be considered fair use but condemns acquiring those works from pirated sources, establishing a key precedent for how AI companies must obtain training data. - [00:03:18](https://www.youtube.com/watch?v=8beAhtbnM4Y&t=198s) **Court Ruling Signals Pay for AI Scraping** - The judge’s finding that Anthropic could have purchased the books earlier underscores that AI companies must financially compensate authors for using their works, offering a tentative victory for creators amid broader, unsettled litigation. ## Full Transcript
0:00We got a bit of a road map today for the 0:02future of copyright cases in AI, which 0:05is something that I've been following 0:07really closely. I want to give you an 0:09outline of the ruling and then a look at 0:10where we stand on the legal challenges 0:13to AI right now. So, first this R 0:15ruling, it was by Judge William Alup and 0:18it was handed down in the case of Barts 0:20versus Anthropic. It validates AI 0:23training as fair use, but it condemns 0:26the piracy that enables it. And I want 0:28to spend a little bit of time here 0:29because Judge Alup was very precise and 0:32careful in the ruling. It's not as clear 0:34as saying this is a win for anthropic 0:36and AI companies because it enables fair 0:39use in AI. I would say if you want to 0:42think about how to frame this, it's sort 0:43of a Solomon's choice. It splits the 0:46baby. Yes, training Claude on millions 0:49of books does constitute fair use. But 0:52critically, Anthropic's choice to 0:55download those same books from pirate 0:57sites, which it did for earlier versions 1:00of Claude does not get a free pass. That 1:03distinction matters because it 1:04fundamentally shapes how AI companies 1:07must think about data acquisition going 1:09forward. So, the judge's reasoning, and 1:11this is really key to me, he does 1:13describe AI training as quintessentially 1:16transformative. Everyone reads texts and 1:19then writes new texts. also writes to 1:22make anyone pay specifically and I'm 1:24reading from his judgment here to make 1:26anyone pay specifically for the use of a 1:28book each time they read it each time 1:31they recall it from memory each time 1:33they later draw upon it when writing new 1:35things in new ways would be unthinkable 1:38I think Judge Alip gets that part right 1:40I think that is exactly what I've been 1:41worried that judges will not see and I 1:44find that incredibly encouraging that 1:46Judge Alip understands that AI is a 1:48transformative technology a transforms 1:50the text it's trained on. And this forms 1:53a conceptual foundation that I think 1:55other AI companies will be able to use 1:57when talking about how they do their 1:59work. Now, this is where the anthropic 2:02story gets interesting because after 2:04building their initial models on pirated 2:07content from Library Genesis and other 2:09places, dubious sources, the company 2:12made a deliberate shift in 2024. They 2:14hired Tim Tom Tom sorry Tom Turvy the 2:18former head of Google's book scanning 2:20project and Tom's mandate was to legally 2:24obtain all the books in the world. Can 2:26you imagine? I got to say as someone 2:28with a library that is my dream job. 2:30Tom, if you ever get tired of your job 2:32at Anthropic, please let me know. I 2:34would love to have the job of getting 2:35all the books in the world. Anyway, 2:37Anthropic spent millions of dollars, a 2:39significant percentage of the share of 2:41its total training costs for the new 2:44Sonnet and Opus models, purchasing 2:46physical books, many of them secondhand, 2:49which they then proceeded to slice from 2:51their bindings and scan into digital 2:53format. Yes, the physical books were 2:55destroyed, but the digital copies were 2:57ruled as legitimate fair use because 2:59Anthropic acquired the books 3:01legitimately. So the pivot for anthropic 3:05from using piracy to purchasing reveals 3:08a critical principle that I think other 3:11judges are likely to follow. AI 3:13companies can afford to do this right. 3:16Not all of them choose to do so. And the 3:18court does note the financial capability 3:21because if you can afford to purchase 3:23later then you could have purchased 3:25earlier. And judge also writes that 3:27using purchased books later quote will 3:30not absolve it of liability for the 3:32theft but may affect the extent of 3:34statutory damages. The judge saw that 3:36Anthropic had the money all along. So 3:38what does this mean for authors? I know 3:40authors in my life. In fact, arguably I 3:42am an author on Substack, right? Uh and 3:45I know that AI reads my stuff. Anyway, I 3:48think this ruling offers a glimmer of 3:49hope. Fundamentally, part of what 3:51authors have needed is some sense that 3:54companies cannot just scrape and steal 3:57work. There needs to be a sense of being 3:59willing to pay the going rate for the 4:01work in order to use it. So even if AI 4:04constitutes fair use, and authors may or 4:06may not agree that that's legitimate. 4:08That's fine. Everyone can sort of have 4:10different opinions and it's not 4:11certainly settled yet just with one 4:13ruling. It's still a step forward for 4:15authors that the court expects AI 4:17companies to pay for the work. It 4:20establishes something that is closer to 4:22a sustainable equilibrium. Companies 4:24must pay for access, support the 4:26creative economy and authors can benefit 4:28from AI tools if they choose to do so. 4:31Now, there are other open lawsuits out 4:33there, right? Multiple lawsuits against 4:35open AI. There's Kadre versus Meta, the 4:38lawsuit for Training Llama on Books 3, 4:41which is a data set that Anthropic also 4:43used that's pirated that now has an 4:45interesting precedent. And then there's 4:47visual AI companies that may use ALUB's 4:50transformative use reasoning to argue 4:52that image generation models are really 4:53the same thing as the text side and its 4:55fair use there. So the question for me 4:58is where do we go from here? Will we see 5:00courts adopt ALP's framework going 5:02forward? Will we see other sort of 5:06precedents and standards of judgment 5:08emerge? I think one of the things that 5:09I'm aware of is, you know, ALIP writes 5:11in the Northern District of California. 5:13It's not the whole country. We're seeing 5:15circuit splits on related AI issues. For 5:17example, the Ninth Circuit has used an 5:20actual knowledge requirement for 5:21contributory infringement and the second 5:23circuit has used a reason to know. 5:26That's a big difference when we're 5:27talking about platforms that host AI 5:29tools and it affects the sort of extent 5:31of liability that AI platforms will have 5:33in situations like this. Long and the 5:35short of it is this is a step forward in 5:37terms of providing legal clarity. I 5:38really appreciate Judge Alup's 5:40willingness to talk about the 5:42transformative value of AI and not just 5:44call it copying. I think that's a 5:46correct interpretation of what AI does. 5:48I think expecting AI companies to pay 5:50for what they train on is completely 5:53reasonable and we'll have to see where 5:55the story goes from here. Still, a 5:57little bit of clarity from the judiciary 5:58is a step forward. We'll take the win 6:00for today, won't we? All right. Cheers, 6:02guys.