The Token Problem With AI Agents
Every time an AI agent takes an action , runs a tool, reads a file, queries a database , the result gets stuffed into the context window. That context window is what you pay for, especially at higher model tiers like Opus or GPT-5.
Most of what goes in is noise. A bash command that reads a server log file returns thousands of lines. The agent needs to find the error message. It does not need every INFO log from the last 48 hours. But all of it lands in context and all of it costs tokens.
What Headroom Does
Netflix open-sourced Headroom, a tool that compresses everything your AI agent reads before it reaches the LLM. Tool call outputs. Code files. RAG results. Anything that comes back from an external source gets processed by Headroom first.
The compression is statistical. In a test with a server log file, Headroom removed 419 similar INFO logs and replaced them with a compressed summary. The model received the same answer. The token count dropped by 98%.
That is not a rounding error. A 98% reduction in context tokens means the same agent task costs roughly 2% of what it cost before.
The Reversible Part
The concern with compression is obvious: what if the compressed version loses something important? Headroom addresses this by making compression reversible. When it compresses a section, it stores a hash. If the model determines it needs the full version , for verification, for edge case handling, for any reason , it can request the original via the hash.
The model is not flying blind. It receives a compressed representation and knows it is compressed, with an explicit path back to the source. The default is compressed. The fallback is always available.
Why This Matters for Agent Economics
The cost curve of AI agent deployment has been a serious concern for anyone running agents at scale. Uber burned its annual AI budget in four months partly because token consumption from Claude Code scaled faster than projected. Enterprise teams at Fortune 500 companies are grappling with 150,000 agents burning tokens continuously.
A tool that compresses context by 60 to 98% depending on the content changes the economics of every one of those deployments. The same agent, the same task, a fraction of the cost.
Headroom is open source. It is not a product with a pricing tier. It is infrastructure that anyone building with AI agents can drop in today.