---
title: Hard to Feel Sorry for Anthropic Over Alibaba's Distillation
description: Alibaba paid Anthropic for the Claude access it calls illicit. Anthropic built Claude by scraping the world's work for free, then settled a $1.5bn piracy case.
author: Darie Nani (Editor-in-Chief)
date: 2026-06-27T08:01:40.899Z
updated: 2026-06-27T08:01:40.910Z
canonical: https://www.sovereignmagazine.com/article/anthropic-distillation-built-on-copying
image: https://cdn.nanimediahouse.com/anthropic-built-on-copying.webp
categories: Artificial Intelligence, Legal
region: United States
publication: Sovereign Magazine
about:
  - type: Organization
    name: Anthropic
---

Anthropic, which agreed last year to pay $1.5 billion to authors for training Claude on books it took from pirate libraries, has accused Alibaba of a 'brazen' and 'illicit' campaign to copy the model that data helped build. The only difference, Alibaba paid Anthropic to do so.

In a [letter to Washington](https://www.sovereignmagazine.com/article/anthropic-alibaba-distillation-export-ban), Anthropic says operators tied to Alibaba's AI lab ran 28.8 million exchanges with Claude through about 25,000 fraudulent accounts, then used the outputs to train its rival Qwen models. The accounts were fake in identity, set up to get past Anthropic's ban on access from China. They were not free. Claude is metered and billed by use, so 28.8 million queries is a paying customer's bill, not a break-in. Anthropic has not said how much Alibaba spent. At standard rates it runs into the millions.

Anthropic wants this treated as a national-security matter. It has not sued or named a criminal statute, and its word is 'illicit', not theft. But it built Claude by copying most of the writing, images and code on the internet without asking, and paid for none of it until a court forced it to. The one time money actually changed hands, it changed hands in Anthropic's favour.

## Is Model Distillation Illegal

Distillation is standard. A smaller model trains on the outputs of a larger one and inherits much of its behaviour for a fraction of the cost. On your own model it is routine engineering. On a competitor's it breaches the terms of use, which at Anthropic, OpenAI, Mistral and xAI all ban using outputs to build a rival. That is a contract dispute, not a crime, and the [intellectual-property claims beyond it are largely untested](https://www.winston.com/en/insights-news/is-ai-distillation-by-deepseek-ip-theft). OpenAI made the same accusation against [China's DeepSeek](https://www.sovereignmagazine.com/article/china-s-deepseek-takes-on-us-tech-giants-what-this-means-for-project-stargate) in early 2025 and did not sue either. What these cases do is move the fight out of civil court, where a breached subscription is worth little, and into export controls, where China and national security are worth a great deal.

## The Industry Was Built on Copying

The model whose outputs Anthropic says were illicitly extracted was itself built on a vast amount of work taken without permission. Claude, like OpenAI's GPT and Google's Gemini, was trained on the open web scraped through [Common Crawl](https://en.wikipedia.org/wiki/Common_Crawl), which The Atlantic found had misled publishers about respecting paywalls. Reddit says Anthropic scraped its site more than 100,000 times and carried on after being told to stop. It happened first, before most of the people who made the material knew, and before any market existed to license it. The blocks went up [after the models were already trained](https://www.dataprovenance.org/Consent_in_Crisis.pdf).

That taking is now [more than fifty copyright lawsuits](https://chatgptiseatingtheworld.com/2025/10/08/status-of-all-51-copyright-lawsuits-v-ai-oct-8-2025-no-more-decisions-on-fair-use-in-2025/). The [New York Times is suing OpenAI](https://www.sovereignmagazine.com/article/openai-microsoft-sued-by-new-york-times-for-copyright-infringement) over millions of articles. Music publishers are suing Anthropic for $3.1 billion over song lyrics. Getty Images is suing Stability AI over 12 million photographs. Anthropic's own [$1.5 billion settlement](https://www.sovereignmagazine.com/article/anthropic-s-historic-copyright-settlement-could-reshape-ai-industry-s-data-practices), about $3,000 each for 482,460 pirated books, is not the whole of its exposure but the one slice so plainly unlawful that no defence could cover it. On the wider question the courts have mostly sided with the labs: Judge William Alsup [ruled in 2025](https://www.npr.org/2025/06/25/nx-s1-5445242/federal-rules-in-ai-companys-favor-in-landmark-copyright-infringement-lawsuit-authors-bartz-graeber-wallace-johnson-anthropic) that training on books was fair use, 'exceedingly transformative'. That is the industry's legal shield, not the world's consent, and it is not settled, with the Times case and most others still to be decided.

## The Reaction to Anthropic's Complaint

The complaint landed badly. Much of the response read it as [rules for thee but not for me](https://www.digitalmusicnews.com/2026/06/25/anthropic-alibaba-ai-row/), given that the company doing the complaining had just paid $1.5 billion for taking work that was not its own. Rivals piled in. At Elon Musk's xAI, which faces author copyright suits of its own, the line was that Anthropic 'is guilty of stealing training data at massive scale'. China's state media [dismissed the accusation](https://www.globaltimes.cn/page/202606/1364418.shtml) as 'technological hegemony anxiety', an attempt to build barriers around a monopoly and to draw attention from Anthropic's own history of using data it did not own. The people with the strongest claim to be wronged, the writers and photographers whose work trained Claude, were not asked either way.

## Anthropic Was Paid for the Distillation It Calls Illicit

So this is the company that took the work of millions of writers, photographers and programmers for nothing, now indignant that someone copied its model after paying for the access. Terms of service or not, that is a hard case to put to the public. The writers whose books trained Claude were paid nothing, or paid only after years in court. Alibaba at least put money on the table. The principle Anthropic is invoking, that you should not build on someone else's work without paying for it, is the one its own product was built on breaking. The difference is that Alibaba paid.

## FAQ

**Q: Is AI model distillation legal?**
Distilling your own model is legal and standard. Distilling a competitor's model by harvesting its outputs without permission generally breaks that company's terms of use, which is a contract breach rather than a crime. Trade-secret and copyright claims against distillation are difficult and largely untested, which is why Anthropic has gone to the government rather than to court over Alibaba.

**Q: Did Alibaba pay Anthropic to access Claude?**
The 25,000 accounts were fraudulent in identity, created to evade Anthropic's ban on access from China, but the usage was billed. Claude's models are metered by use, so 28.8 million queries generated revenue for Anthropic. The company has not disclosed how much.

**Q: Was Claude trained on copyrighted material?**
Yes. Anthropic trained Claude on large amounts of copyrighted text, including books and material scraped from the open web. A federal judge ruled in 2025 that training on copyrighted books was fair use, while separately finding that downloading pirated copies to build a library was not.

**Q: Did Anthropic use pirated books to train Claude?**
Yes. Anthropic downloaded millions of books from pirate libraries such as Library Genesis. It settled the resulting class action for $1.5 billion, about $3,000 for each of 482,460 works, the largest copyright settlement in US history.

**Q: Is web scraping for AI training legal?**
It is contested. Scraping publicly available data is not automatically illegal, but using copyrighted content to train a model is the subject of more than fifty lawsuits, including the New York Times case against OpenAI. Courts have so far leaned toward treating training as fair use, but no higher court has settled the question.
