There's a kind of cognitive dissonance in Dario Amodei's voice when he describes what happened to Anthropic's compute consumption. He's describing a problem that would kill most companies, a catastrophic underestimation of demand, but the problem manifested because Claude Code became so useful, so fast, that the infrastructure team's projections were off by a factor of eight in a single quarter.
"We planned for a world of 10x growth per year. In the first quarter of this year, we saw 80xed annualized growth per year in revenue and usage."
- Dario Amodei, CEO of AnthropicThat single sentence contains a complete history of the agentic AI transition. Anthropic built its compute plans for a world where Claude was a capable assistant. Claude Code turned it into an autonomous developer, and developers who adopt autonomous coding tools don't use 10x more tokens than before. They use 100x more, because every task now spawns agents that spawn subagents that spawn tool calls that each cost tokens.
The Colossus Deal That Nobody Saw Coming
When Elon Musk announced on X that "no one set off my evil detector" after Anthropic signed a lease on xAI's Colossus supercomputer cluster, it was one of the stranger moments in the recent history of the AI industry. Anthropic, the company founded by safety-focused former OpenAI researchers who have publicly warned about existential AI risk, was renting GPU capacity from a company whose founder has positioned himself as a critic of the "AI safety" framing.
The deal is a monument to infrastructure realism. Anthropic needed compute immediately. xAI had the largest operational GPU cluster in the world. Ideology and competitive dynamics yielded to physics: you can't serve 80x growth with a procurement plan designed for 10x.
Colossus 1 is genuinely staggering in scale. The Memphis facility houses over 220,000 GPUs across multiple generations:
| GPU Type | Quantity | Notes |
|---|---|---|
| NVIDIA H100 | 150,000+ | Primary inference workload |
| NVIDIA H200 | ~50,000 | Higher memory bandwidth than H100 |
| NVIDIA GB200 | ~20,000 | Blackwell architecture, also the problem |
| Total | 220,000+ | 300 MW power draw; ~$2.6/GPU/hr lease rate |
At a market lease rate of approximately $2.60 per GPU per hour, the annualized cost of 220,000 GPUs approaches $5–6 billion, roughly matching xAI's own estimated annual compute burn. For Anthropic, a company that was burning around $3 billion annually before Claude Code's growth inflection, this represents a fundamental change in cost structure.
The Chip Overheating Problem
The GB200 GPUs in the Colossus cluster represent Nvidia's most advanced silicon, and Anthropic's most acute operational headache. The chips are, by multiple accounts, running so hot under sustained inference workloads that the cooling infrastructure wasn't designed for what the actual usage pattern turned out to be.
The GB200 Blackwell chips in Anthropic's leased Colossus capacity have been reported to experience thermal management issues under the continuous high-utilization loads characteristic of agentic workloads. Unlike batch training jobs with predictable thermal profiles, inference serving for agentic tasks runs at near-continuous 100% utilization with no natural cool-down cycles. The chips, designed for a different load profile, are running hotter than spec under real-world AI coding workloads.
This is the dirty secret of the agentic transition: the compute infrastructure was designed and benchmarked for training runs and burst inference, not for the sustained, continuous inference that autonomous coding agents require. When a developer uses Claude Code to refactor a codebase, the agent might run continuously for hours, issuing tool calls, reading files, writing code, checking outputs, at a utilization level no GPU thermal design expected from inference serving.
The MFU Gap: Why Anthropic Is Paying More Than It Should
Beyond thermal issues, there's a deeper efficiency problem that explains why compute costs are scaling faster than usage. Model Flop Utilization (MFU) measures how much of a GPU's theoretical compute capacity is actually being used for productive work. For training runs at Meta and Google, MFU typically runs 40–50%. For Anthropic's inference-heavy workloads on the Colossus cluster, MFU is running at approximately 11%.
That gap, 11% vs. 40%+, means Anthropic is paying for roughly four times more compute than it's productively using per dollar. The causes are structural: inference serving requires holding model weights in memory, managing variable-length context windows, batching requests efficiently across thousands of concurrent sessions, and doing all of this with the unpredictable arrival patterns of real-world developer usage. None of these factors allow the clean, predictable utilization of a training run.
At 11% MFU vs. the 40%+ achieved during training workloads, Anthropic is effectively wasting roughly 3 out of 4 dollars spent on GPU compute. Closing the gap to even 25% MFU would meaningfully change the unit economics of serving Claude Code at scale, but agentic inference patterns make efficient batching structurally harder than training.
Rate Limits: The Demand Signal Nobody Ignored
The operational story of Claude Code's growth is written partly in rate limit changes. When a product goes from planned to released, rate limits are set conservatively to manage capacity. When demand exceeds projections by 8x, rate limits tell you exactly how badly you underestimated.
Anthropic's rate limit trajectory for the first half of 2025 reads as an inverse history of the compute scramble:
- 5-hour token windows doubled, the rolling window for measuring usage was extended, effectively giving developers more sustained headroom
- Peak-hour reductions eliminated, Anthropic stopped throttling during high-demand periods as capacity expanded
- Opus tier limits increased 2–10x depending on subscription tier, reflecting the reality that Claude 3.5 Opus and Claude 3.7 had become the default for serious agentic work
Each change corresponds to a round of capacity acquisition. The rate limit timeline is, in effect, the infrastructure procurement timeline with the dollar amounts redacted.
A Company That No Longer Writes Code the Old Way
The most striking data point in the Claude Code story might not be the 80x usage growth. It might be what's happening inside Anthropic itself.
"There's literally no manually written code anywhere in the company anymore."
- Boris Churnev, Anthropic engineerThat claim, from an Anthropic engineer, is either hyperbole or the most significant statement about software development practice in years. Probably some of both. But even if "literally no manually written code" is slightly overstated, the directional claim is striking: the company building Claude Code has restructured its own engineering workflows around the tool to such a degree that manual coding has become the exception rather than the norm.
This creates an interesting self-reinforcing loop. Anthropic uses Claude Code to build Claude Code. The features developers most want from an agentic coding tool are features that Anthropic engineers encounter in their own workflows. The feedback loop between product and user collapses when the users are the builders.
The Infrastructure Company Anthropic Didn't Mean to Build
Anthropic was founded as a safety research organization. Its founding team's motivating concern was that sufficiently powerful AI systems could be dangerous, and that the path to safe AI required more rigorous alignment research than was being done at existing labs. The founding documents, public statements, and research agenda all pointed toward careful, deliberate development.
Nothing about that mission required Anthropic to become one of the largest GPU operators in North America. But the logic of product success in AI is inexorable: if your model is good enough that developers build critical workflows around it, those developers require reliability, capacity, and latency guarantees that only hyperscale infrastructure can provide.
The Colossus lease is the cost of having a model people actually depend on. The $5–6 billion annualized compute spend is what happens when a safety-focused lab's AI assistant becomes load-bearing infrastructure for tens of thousands of developers. The GB200 chips running hot are a hardware metaphor for the gap between what Anthropic planned and what the market turned out to need.
What Comes Next: $200B+ and Counting
The industry context for Anthropic's compute trajectory is the $200 billion in data center commitments that major AI companies and cloud providers have announced for 2025 and 2026. Microsoft, Google, Amazon, Meta, and Oracle have collectively committed to building infrastructure at a scale that would have seemed implausible in 2023. The Colossus lease is Anthropic's entry into this compute race, financed partly by its revenue growth and partly by a $4 billion Amazon investment that came with cloud computing credits.
The near-term questions are structural: Can Anthropic improve MFU enough to make the unit economics work at scale? Can the GB200 thermal issues be resolved at the firmware and cooling-infrastructure level without requiring hardware swaps? Can rate limit improvements keep pace with developer demand as agentic workflows become standard?
The longer-term question is strategic: Anthropic didn't set out to be an infrastructure company. Infrastructure companies operate at thin margins, require continuous capital reinvestment, and compete on reliability rather than research differentiation. The 80x growth quarter is evidence that Anthropic built something developers genuinely depend on. The question is whether the infrastructure operation required to serve that dependence is compatible with the research organization that created the product in the first place.
Dario Amodei's 10x plan didn't anticipate an 80x reality. The infrastructure scramble that followed is now one of the defining stories of 2025 AI. The safety-focused lab that wanted to develop AI carefully has, as a direct consequence of building AI that people actually use, become responsible for a piece of global compute infrastructure that can't afford to blink.