Is AI Fair Use?
Two recent decision suggest it can be...

This summer, the Northern District of California issued two decisions regarding the use of copyrighted works to train AI large language models (LLMs), after authors sued AI developers for copyright infringement. While both judges ruled in favor of the AI developers, according to the fair use doctrine, their decisions raise further issues about how fair use should be applied to such rapidly evolving technology.
In the first case, Bartz v. Anthropic, Judge William Alsup ruled that the use of copyrighted books to train Anthropic’s Claude AI LLM was supported by the first factor of fair use, given that its purpose and character were "quintessentially transformative.” Like an aspiring writer who has read an array of books, the LLM used the books as a basis for generating new content that neither replicated nor served the same purpose as the originals. The court also ruled that Anthropic’s digitization of purchased works for a centralized training library was fair use, though pirating unauthorized copies for the library was not.
In the related case Kadrey v. Meta, Judge Vince Chhabria relied more heavily on the fourth factor of fair use. He reasoned that Meta’s use of copyrighted works for LLM training did not cause market harm to the originals, either by reproducing them or harming the market for licensing them for AI training. However, he noted the importance of considering how LLMs could dilute the market for originals by generating substantially similar works, even though the plaintiffs did not present strong enough evidence of this.
Judge Chhabria’s warning points to a potential issue in both cases: too much emphasis placed on fair use of the LLM inputs (training material) rather than on their outputs. For example, with AI capable of creating detailed summaries of works within a matter of seconds, there is a very real possibility they will replace the need for readers to purchase copies of the originals. While online summaries have existed for decades, they have never been so accessible and personalized as those generated by AI.
In general, the question arises of how similar AI outputs have to be to the original works to cause market harm. While the markets for fiction and classic books may not be severely impacted, what about the markets for nonfiction books, whose appeal arises less so from creativity, style, and celebrity?

