Notes on "When AI Builds Itself"
Notes on "When AI Builds Itself"
A note on scope: this article is a translation, not an analysis. It does not offer verdicts on Anthropic's motives, evaluate the policy proposals, or explore the broader implications of what the paper describes. Those are important conversations — and Tech Reader Magazine will have them in future essays. Right now, the only goal is to make sure you understand what the paper actually says. That is enough for one article.
"I started leaning hard into Claudifying about a year ago. That's been a crazy adventure and it's now been about five months since I last wrote any code myself."
That quote comes from an engineer at one of the most consequential AI laboratories in the world. The paper quotes Anthropic employees throughout, keeping them anonymous by institutional choice. Their voices — candid, occasionally unsettled, occasionally exhilarated — are some of the most important things in the document. We will return to them. First, the context.
What the Paper Is and Where It Comes From
Anthropic published "When AI Builds Itself" on June 4, 2026. It was written by Marina Favaro and Jack Clark of the Anthropic Institute — a think tank Anthropic launched in March 2026 to study the broader implications of AI development. The paper draws on internal data from Anthropic that has not previously been made public, combined with publicly available benchmarks and research. It makes two distinct arguments: a factual one about what is already happening inside Anthropic's engineering and research operations, and a policy one about what the world should do in response. This article covers the factual argument. The policy argument will be addressed separately.
What Is Actually Happening Inside Anthropic
The paper's central factual claim is this: AI is now building AI, and the pace is accelerating. To understand what that means concretely, it helps to know how Anthropic describes the work of building a frontier AI model. The paper breaks it into two broad categories. Engineering is writing the code, standing up the infrastructure, and overseeing the training runs. Research is deciding what experiments to run, interpreting what comes back, and figuring out which ideas to pursue next. The paper then shows, with specific data, how much of each category Claude is now handling.
On engineering, the numbers are stark. As of May 2026, more than 80 percent of the code merged into Anthropic's production codebase was authored by Claude. Before Claude Code launched in February 2025, that figure was in the low single digits. The shift happened in fifteen months. In the second quarter of 2026, the typical Anthropic engineer was merging eight times as much code per day as in 2024 — not because engineers are working harder, but because Claude is writing the code while the engineer directs and reviews. Anthropic is candid that lines of code is an imperfect measure of productivity, and that the 8x figure almost certainly overstates the true gain. The direction, however, is not in dispute.
The paper gives two concrete examples that make the scale of this tangible. In April 2026, Claude was pointed at a persistent class of API errors. Working autonomously, it shipped more than 800 individual fixes, reducing the error rate by a factor of one thousand. The engineer overseeing the work estimated that a human would have needed four years to complete it — not because any individual fix was beyond human capability, but because the cognitive load of holding that much unfamiliar code in working memory simultaneously is simply beyond what a single human mind can sustain. Separately, when a routine upgrade began crashing tens of thousands of training jobs, an engineer pointed Claude at the live incident with minimal context. Claude isolated the specific debugging flag causing the crash and confirmed a fix in about two hours. The same problem would typically take a human engineer two to three days.
On research — the harder category — the picture is more nuanced but moving in the same direction. The paper describes a spectrum of research tasks, from executing a well-specified experiment at one end to choosing which problems are worth working on at the other. Claude has become highly capable at the execution end. At the direction end — the judgment calls about what to investigate and why — humans still hold a meaningful advantage. But that advantage is narrowing, and the paper documents it narrowing in real time.
The clearest example involves a test Anthropic runs with every model release: give Claude some code that trains a small AI model and ask it to make that code run as fast as possible. The goal is fixed; Claude's job is to find speedups by rewriting, running, timing, and repeating. In May 2025, Claude Opus 4 averaged roughly a 3x speedup. By April 2026, Claude Mythos Preview — Anthropic's most capable model, currently restricted from public release — was achieving roughly 52x. For reference, a skilled human researcher needs four to eight hours to reach 4x on the same task.
The research judgment question — can Claude decide what to investigate, not just how — was tested in a different experiment. Claude-powered agents were given an open problem in AI safety and left to solve it end to end: proposing hypotheses, running experiments, sharing findings with parallel agents, and iterating. Two human researchers, working for about a week, recovered roughly 23 percent of the target performance gap. The agents recovered 97 percent over 800 cumulative compute-hours at a cost of approximately $18,000. The paper is careful to note the caveats: the result did not transfer cleanly to production-scale models, and humans still chose the problem and designed the scoring. But within those boundaries, the agents designed every experiment themselves.
What This Is Doing to the People Inside Anthropic
The paper does not dwell on the human experience of working inside this transformation. But the employee quotes it includes are worth reading carefully, because they capture something the data cannot.
"The shape of stuff today is roughly 'humans have ideas, and the models are able to implement, test and evaluate them an order of magnitude faster than before.'"
That is the optimistic reading, and it is genuinely optimistic. The humans are still the ones with the ideas. The tools have made the distance between an idea and a tested result almost vanishingly small. For people whose work is fundamentally about ideas, that is an extraordinary capability. The paper's internal poll supports this: a survey of 130 Anthropic research staff found that the median respondent estimated roughly four times as much research output with Mythos Preview compared to working without AI assistance.
But the paper also includes this:
"On days where everything works well, I can't help but think nothing I do matters, everything is automated and better and faster than I ever will be. But then there are days where everything breaks and I don't understand why and I realize I have no idea what I've been up to anymore."
The paper does not analyze that quote. It simply includes it and moves on. The reader is left to sit with it.
The Three Futures
The most important section of the paper for a general reader is the one that lays out three possible futures. The paper buries it near the end. It deserves to be front and center.
Future one: the trend stalls. The exponential curves flatten. The capabilities that feel most distinctly human — research judgment, the ability to choose which problems matter — turn out to be beyond the reach of current training methods. Progress continues but slows. Governments and societies have time to adapt. Anthropic says this is possible but does not believe it is likely.
Future two: the trend compounds, but humans stay in the loop. AI development becomes substantially automated, but humans continue to set research directions and judge results. Organizations using AI become dramatically more efficient — a 100-person company doing the work of a 10,000-person organization. The human role narrows to direction, taste, and oversight. Anthropic says the evidence they have suggests this is the most likely near-term trajectory.
Future three: full recursive self-improvement. AI systems become capable of designing and refining their own successors without meaningful human involvement at each step. The pace of AI development becomes determined almost entirely by the availability of compute. Humans shift to oversight and verification of a process they no longer directly drive. The paper does not say this is inevitable. It says it could arrive sooner than most institutions are prepared for, and that future two could transition into future three faster than the gap between them currently appears.
"The comparative advantage of humans as of right now is still in seeing the bigger picture and thinking beyond the confines of the immediate task."
What the Paper Is Asking For
The policy argument — which this article is not evaluating, only describing — is this: the world needs a verified mechanism through which multiple frontier AI laboratories, in multiple countries, could coordinate a slowdown or pause in frontier AI development if conditions warranted it. Anthropic is explicit that a unilateral pause by one laboratory would accomplish little — it would change who the front-runner is without creating the broader deliberative process that is missing. What they are calling for is a multi-party, verifiable agreement, similar in structure to arms control treaties, that would allow a coordinated pause if the leading labs agreed one was necessary.
The paper acknowledges how difficult this is. Training runs are far easier to conceal than missile silos. The incentive to defect quietly — to continue development while others pause — is enormous, because whoever continues while others stop would inherit the lead. A credible mechanism would need to specify what triggers a pause, what lifts it, and who adjudicates. None of that exists today. The paper is asking for the work of building it to begin.
What to Take Away
The paper is asking its readers to hold two things simultaneously. The first is that what is already happening — AI writing the majority of its own codebase, AI outperforming human researchers on well-defined experiments, AI beginning to make better next-step decisions than the humans running the sessions — is remarkable, genuinely useful, and accelerating. The second is that the trajectory this points toward, if it continues, will require governance structures that do not yet exist and a level of international coordination that has rarely been achieved even for far simpler technologies.
That is the paper's argument, stated as plainly as possible. Whether it is right, what its implications are, and what should be done in response — those are the conversations that come next. Tech Reader Magazine will have them. For now, the document exists, the data is public, and the people building the technology are the ones raising the question. That combination is worth understanding clearly before deciding what to think about it.
The full paper is available at Anthropic.com. It is 25 pages long. But it is worth the time to read it.
Founded in 2021 by former OpenAI researchers with a safety-first mandate, Anthropic has grown from a small research lab into one of the most consequential companies in the world. A factual timeline — no roasting, no verdict, just the record.