How They Built the Trust

February 2023. Every major AI lab was saying the same thing: these models are too powerful to let people run on their own hardware. You could use them through an API. You could pay for access. But you could not own the weights, look under the hood, or modify anything.

Meta looked at that landscape and made a different decision. They released Llama. Downloadable. Modifiable. No API key. No subscription. No permission needed.

Within two years, it had been downloaded over a billion times , more than any AI model in history. Developers loved Meta. The open source community, historically suspicious of big tech, considered them the good guys. It was the most goodwill Meta had accumulated since Facebook's early days.

Then they spent it all in a single weekend.


The Saturday Drop

Most major tech launches happen on Tuesday mornings. Press embargoes are coordinated. Blog posts are staged. Partner quotes are lined up. The machine runs in a specific order.

Meta dropped Llama 4 on a Saturday with no warning. No press cycle. Just: here it is.

Three models at once. Llama 4 Scout with a 10-million-token context window , roughly 85 books worth of input. Llama 4 Maverick. And Llama 4 Behemoth, a two-trillion-parameter model that was supposed to justify everything. Behemoth was still in training. Meta announced it anyway.

The community did what the open source community always does: they downloaded it immediately and tested it themselves.


The 16%

A developer ran Maverick through Aider Polyglot , a community benchmark for real-world coding ability, widely used to stress-test models on actual tasks.

Maverick scored 16%. For context, DeepSeek v3 and Claude 3.7 Sonnet were significantly higher. Meta had claimed Maverick beats GPT-4o and rivals DeepSeek v3. The independent tests said otherwise.

Then the leaderboard controversy hit. Multiple developers noticed that Meta's benchmark scores had been obtained using a version of Maverick that was fine-tuned specifically for benchmarks , not the version being released to the public. When the community tested the release version, it underperformed the claimed numbers.

Meta's response: their Gen AI head cited "inconsistent performance across inference platforms" and asked for patience. No technical paper accompanied the launch , the first Llama release without one. The community was being asked to trust Meta without receipts.


What Was Happening Inside

Four days before the Llama 4 launch, Joel Pineau , Meta's VP of Research for AI, one of the key architects of the Llama program , was among the first to leave.

According to people close to the situation, Zuckerberg had watched Meta's AI position erode relative to competitors and decided the answer was to move faster. Teams that had been doing long-horizon research , the kind that might matter in five or ten years , were redirected toward shipping. Experiments stopped. Safe, proven approaches replaced frontier research. The goal was to have Llama 4 out, not to have it right.

The result was a launch that moved fast and broke the one thing Meta had in AI that nobody else did: the trust of the open source community.


What Goodwill Actually Costs to Rebuild

The billion downloads did not come from marketing. They came from developers who believed Meta was genuinely committed to open weights because it was the right thing to do, not because it was strategically convenient at the time.

Benchmark manipulation , even the appearance of it , destroys that belief instantly. Once a community believes you are gaming the numbers, every future number gets questioned. The next Llama release will face a level of scrutiny that previous releases never had to survive.

You can rebuild a product. The trust that turned Llama into the most downloaded AI model in history took two years and one genuine commitment to build. It took one Saturday to spend.