Inside the Model Distillation Attack on Claude
Inside the Model Distillation Attack on Claude
By Aaron Rose · Tech Reader Magazine · June 26, 2026
Inside the Attacker's Lab
The accounts look real enough. Each one has a usage history, a plausible profile, a pattern of queries that doesn't immediately trigger the rate-limiting logic. There are hundreds of them active at any given hour, thousands across the operation. Each one is asking questions — careful questions, structured questions, questions designed not to get answers but to capture the shape of how a very capable mind responds. The outputs are logged. Fed into a pipeline. Used to train something smaller, cheaper, and almost as good. Nobody in the room calls it model theft. They call it model research.
Largest Attack on Anthropic
In a letter dated June 10, 2026, addressed to Senate Banking Committee Chair Tim Scott and Ranking Member Elizabeth Warren, Anthropic accused operators affiliated with Alibaba and its AI research division, Alibaba Qwen, of conducting a coordinated campaign to illicitly harvest capabilities from its Claude AI model. The campaign ran from April 22 to June 5, generating more than 28.8 million exchanges through nearly 25,000 fraudulent accounts. Anthropic described it as the largest known distillation attack in the company's history. Alibaba did not respond to requests for comment from multiple outlets.
Anthropic described it as the largest known distillation attack in the company's history.
The letter was first reported by Bloomberg and subsequently confirmed by Reuters, CNBC, and the Wall Street Journal. According to reporting, the campaign specifically targeted Claude's agentic reasoning, software engineering proficiency, and long-horizon task completion — the capabilities that sit closest to what Anthropic considers frontier performance. These are not general-purpose outputs. They are the specific behaviors that take the most compute and the most research to produce.
It is not the first time Anthropic has made this accusation against a Chinese AI laboratory. In February 2026, the company publicly identified three separate industrial-scale distillation campaigns — attributed to DeepSeek, Moonshot, and MiniMax — targeting Claude through similar methods. The Alibaba campaign, if the allegations hold, represents an escalation in both scale and sophistication.
28.8 Million
The number of exchanges generated through nearly 25,000 fraudulent accounts between April 22 and June 5, 2026 — what Anthropic describes as the largest known distillation attack in its history.
What Distillation Actually Is
The technique at the center of this story is not new, and it was not invented as a weapon. Knowledge distillation was formalized as a machine learning method around 2015, associated with work on model compression. The original problem it solved was practical and benign: large models are expensive to run. If you could train a smaller model to approximate the behavior of a larger one, you could deploy the smaller model at a fraction of the cost without losing much performance.
The mechanism is straightforward. You take a large, capable model — the teacher. You feed it inputs and record its outputs. You use those input-output pairs as training data for a smaller model — the student. The student learns not from raw data but from the teacher's responses to that data. Over enough examples, the student begins to approximate the teacher's behavior, including nuances in how it reasons, hedges, and structures answers that would be difficult to replicate through conventional training alone.
For model compression, this is elegant engineering. For intellectual property extraction, it is a precision instrument. You do not need access to the model's weights — the mathematical parameters that represent everything the model has learned. You do not need the training data. You do not need the compute infrastructure. You need API access and enough queries to capture the behavior you want to replicate.
You don't need to steal the recipe. You just need to eat enough meals to reverse-engineer it.
How the Attack Works
A distillation attack at industrial scale requires three things: access, volume, and structure. Access means getting queries to the target model without triggering detection — hence the fraudulent accounts, the distributed traffic patterns, the plausible usage histories. Volume means generating enough input-output pairs to train a student model with meaningful capability. Structure means asking the right questions in the right sequence to capture the specific behaviors you want, rather than generating random outputs that add noise to your training set.
The 28.8 million exchanges attributed to the Alibaba campaign reflect all three. The fraudulent account infrastructure — nearly 25,000 accounts — is the access layer, designed to distribute traffic below detection thresholds. The volume is self-evident. The targeting of specific capabilities — agentic reasoning, software engineering, long-horizon task completion — reflects structured intent. Someone designed a query strategy to harvest specific behaviors, not general outputs.
This is what separates a distillation attack from ordinary API abuse. Ordinary abuse might scrape outputs for reuse. A distillation attack uses outputs as training signal, with the explicit goal of producing a model that behaves like the target. The product is not the outputs. The product is the student model trained on them.
A distillation attack uses outputs as training signal, with the explicit goal of producing a model that behaves like the target.
What You Can and Cannot Steal
Here is where the honest accounting gets interesting. Distillation is surprisingly effective at capturing surface behavior — the style, the structure, the reasoning patterns, the way a model formats a complex answer or hedges an uncertain claim. A well-executed distillation campaign against a frontier model can produce a student that, on many tasks, is difficult to distinguish from the teacher.
But there is a ceiling. The student model trained on distilled outputs does not acquire the teacher's full capability — it acquires an approximation of the teacher's behavior on the tasks represented in the training queries. Deep capability, the kind that emerges from training on vast data with enormous compute, does not transfer cleanly through a behavioral interface. The student plateaus. It performs well on tasks similar to those in the distillation set and degrades on tasks that require genuine generalization beyond that set.
This matters for interpreting the Alibaba allegation. What was extracted, if the campaign succeeded, is not a clone of Claude. It is a model trained to behave like Claude on a specific and deliberately chosen range of tasks — the tasks that are most commercially valuable and most expensive to develop independently. That is not nothing. It may, in fact, be most of what the attacker needed.
The student plateaus below the teacher. But the plateau may be exactly where the commercial value lives.
What was extracted, if the campaign succeeded, is not a clone of Claude. It is a model trained to behave like Claude on a specific and deliberately chosen range of tasks.
Success Rates
There is no clean industry statistic for distillation attack success rates, because successful attacks are not typically disclosed and unsuccessful ones are not typically detected. What the research literature and the pattern of disclosed incidents suggest is this: for surface behavior on well-defined task categories, distillation is highly effective. For the kind of generalized reasoning capability that defines frontier performance, it is materially less so.
The practical implication is that distillation attacks are most dangerous not at the frontier itself, but one level below it. A state-of-the-art model is hard to fully replicate through distillation. A model that performs at ninety percent of state-of-the-art on specific commercially relevant tasks — software engineering, agentic task completion, long-horizon reasoning — is achievable at a fraction of the development cost. For many applications, ninety percent is good enough. For many competitive contexts, good enough is winning.
Anthropic's framing in its letter to Congress reflects this understanding. The company described distillation attacks as turning billions of dollars in American AI investment into a subsidy for geopolitical competitors. That language is precise. The concern is not that a perfect copy of Claude is circulating. The concern is that the years of research and compute investment required to build frontier capability are being translated, at low cost, into competitive capability that closes the gap without the investment.
A state-of-the-art model is hard to fully replicate through distillation. A model that performs at ninety percent of state-of-the-art on specific commercially relevant tasks — software engineering, agentic task completion, long-horizon reasoning — is achievable at a fraction of the development cost.
Detection and Defense
Frontier AI companies defend against distillation attacks through several overlapping mechanisms, none of them complete. Rate limiting is the first line — restricting the number of queries a single account can generate in a given time window, making large-scale extraction expensive in time if not in cost. Account verification adds friction to the creation of fraudulent accounts, raising the operational cost of the distributed access layer. Behavioral anomaly detection looks for query patterns that suggest systematic extraction rather than genuine use — unusual topic distributions, structured sequencing, high query volumes on specific capability domains.
Watermarking is a more ambitious approach: embedding signals in model outputs that can be detected in models trained on those outputs, potentially allowing attribution of distillation campaigns after the fact. This remains an active area of research rather than a deployed defense. Output filtering — restricting certain categories of response that are particularly valuable for distillation — risks degrading the product for legitimate users and is difficult to calibrate without knowing exactly what an attacker is targeting.
The Alibaba campaign, if the account is accurate, ran for six weeks across nearly 25,000 accounts before it was detected and documented. That timeline is the most instructive number in Anthropic's letter. The defenses exist. They did not prevent a 44-day, 28.8-million-exchange operation from running to apparent completion.
44 daysThe alleged campaign ran from April 22 to June 5, 2026 — six weeks of systematic extraction before Anthropic documented it in a letter to Congress.
The Broader Pattern
The Alibaba allegation does not arrive in isolation. It arrives after DeepSeek, Moonshot, and MiniMax. It arrives in the same month as the Fable 5 export control shutdown. It arrives as Anthropic is telling Congress that distillation attacks are growing in scale and sophistication, and that coordinated action between government and industry is required to address them.
China's state media pushed back quickly. The Global Times published expert commentary describing Anthropic's claims as lacking substance and rooted in what it called technology hegemony anxiety. That response, whatever its merits, confirms that the allegation has landed as a geopolitical event, not merely a technical one.
Back at the Attacker's Lab
The student model is running now. It answers questions about software architecture and long-horizon planning with a fluency that took its teacher years to develop. On most of the tasks it was trained to handle, it is very good. On the tasks outside that range, it hesitates in ways the teacher never did. The people running it know this. They are already building the next version — this time with more queries, better structure, a wider range of tasks. The ceiling is not fixed. It just takes more time to raise it.
What This Means Going Forward
The distillation attack is not a new threat. It is a maturing one. The technique has been understood for years. What is new is the scale at which it is being deployed, the specificity with which attackers are targeting commercially valuable capability domains, and the degree to which it is now a documented and recurring pattern rather than a theoretical risk.
For frontier AI labs, the implication is that API access is not a neutral product decision. Every query that returns a high-quality output from a frontier model is, in aggregate, training data for a potential competitor. The economics are stark: the attacker's cost is API fees and account infrastructure. The defender's cost is years of research and billions in compute. That asymmetry does not resolve through better rate limiting alone.
For the industry more broadly, the distillation attack reframes what frontier AI advantage actually means. It is not enough to build the best model. You have to maintain the behavioral gap between your model and what can be extracted from it. That gap is narrowing, campaign by campaign, 28.8 million exchanges at a time.
Tech Reader Magazine
TechReaderMagazine.com