How you source data can make or break you
In Bartz v. Anthropic PBC, 787 F. Supp. 3d 1007, 1025–26 (N.D. Cal. Jun. 23, 2025), Judge William Alsup took a hard stance against downloading millions of pirated books to build a “central library,” rejecting the idea that you can excuse piracy just because some copies might later be used for training. The judge doubted that taking books from pirate sites when lawful copies were available could ever be “reasonably necessary” to any fair use and called such piracy “inherently, irredeemably infringing,” even if the copies were immediately used for a transformative purpose and then discarded.
The court drew a sharp distinction: using purchased books to train an LLM and a print-to-digital format change for purchased copies were fair uses, but creating a library of pirated works, regardless of the reason for doing so, is not unequivocally fair use. That issue will proceed to trial on liability and damages.
In Kadrey v. Meta Platforms, 788 F.Supp.3d 1026 (N.D. Cal. June 25, 2025), Judge Vince Chhabria rejected an “automatic win” theory because the books were sourced from online “shadow” libraries rather than lawfully purchased copies. However, the court found that it wasn’t proper to completely separate the act of downloading from the act of training: even though they’re different acts, the downloading must be considered in light of the ultimate, highly transformative purpose of training. Because Meta’s use of books to train its Llama model had a “further purpose” and “different character” from the books themselves, and was “highly transformative,” the downloading was too, regardless of where the books came from.
Now, Meta is claiming that any uploading of data that occurred during the torrenting of books from shadow libraries also is fair use as “part and parcel” of its training process. However, it remains to be seen whether that is a bridge too far. In response, the plaintiffs were permitted to add a claim for contributory infringement, though not without Judge Chhabria chastising them for not doing so sooner.
Ultimately, courts are looking past subjective intent and focusing on what the user actually does with the works. The same copy can be used one way, then another, with different outcomes in a fair use analysis. However, calling something “research” does not excuse building a central repository of pirated books as a substitute for paid copies. As for whether Meta succeeds in also having its uploading excused as “fair use,” that remains to be seen.