The Benchmark That Started the Conversation
Hermes topped OpenAI on OpenRouter token usage.
Not in a controlled benchmark. In actual production traffic , the aggregate of all real developers using OpenRouter to route their API calls. The open-source model no one had heard of outside AI circles was getting more real usage than the model that launched the current AI era.
That is the number that turned NetworkChuck , 3.5 million YouTube subscribers, someone who has spent the last two years building with OpenAI and Anthropic tools , into a public convert. His video is titled "goodbye OpenClaw." He is not being rhetorical.
What Is Actually Different
Every AI assistant can be given a persona. You set a name, write a system prompt, and the model plays the role. When you start a new session, it forgets everything and plays the role from scratch again.
Hermes does not work that way.
When you tell Hermes something about yourself , your name, your preferences, that your partner does not like spice , it writes that to a USER.md file in your .hermes directory. Every future session loads that file. The agent is not starting from the same blank slate. It is starting from everything it has learned about you since you first ran it.
NetworkChuck documented what Honcho, the memory layer, built about him after a month: his daily habits, technical procrastination patterns, how he communicates under stress. "Trait, high friction, technical procrastination. Gravitates towards tool building to avoid high stakes communication or soul work." His words: "Ouch."
The point is not that the introspection is flattering. The point is that it is accurate, and it compounds. The agent on day 30 is structurally different from the agent on day one. That gap widens every session.
SOUL.md and the Persona Architecture
Hermes uses markdown files for everything about the agent. The agent's identity lives in SOUL.md. Its memory of you lives in USER.md. When something changes about the agent's understanding, it edits the relevant file directly.
This is the same principle as the SKILL.md format , plain text, version-controllable, readable by both humans and models. But applied to identity and memory rather than workflow instructions.
The practical result: you can see exactly what your agent believes about you. You can edit it. You can share it. You can port it to a different model if Hermes is replaced by something better. The memory belongs to you, not the platform.
The Phone Number Extension
One integration that illustrates where this goes: give Hermes a real phone number through Vapi, one MCP install.
The agent can now make outbound calls. Dentist appointments. Restaurant reservations. Following up on leads. David Ondrej ran a test call , 19 seconds, five cents, successful booking attempt. Not perfect. The agent spoke too fast and got interrupted. Tune-able through Vapi settings.
The direction is clear. An agent that remembers everything about you, running continuously in the background, that can now take actions in the physical world through a phone call , for less than a nickel per interaction.
Why Developers Are Making the Switch
The stated reason is the memory architecture. The real reason is trust.
OpenAI and Anthropic have built tools that are capable and well-maintained. They are also closed, controlled, and oriented toward their own continuity as platforms. Hermes is open source, growing faster than any GitHub project in recent memory, and designed around the premise that the agent's context belongs to the user.
NetworkChuck gave it to his wife. She named it. She calls it her BFF. That is not a technical benchmark. It is the kind of outcome you get when an AI system actually compounds over time in a way that matters to a real person's life , and that kind of trust is very hard to build and very hard to copy.