Lily is McKinsey's internal AI assistant, the platform that tens of thousands of the world's most powerful consultants use every day to synthesize research, draft deliverables, and access proprietary client intelligence. It had been running in production for more than two years. It had passed whatever internal review processes one of the world's most sophisticated professional services firms uses to vet enterprise software. And it had a SQL injection vulnerability that a security researcher cracked open for the cost of a dinner.

The exploit wasn't novel. SQL injection was first formally documented in 1998. Every developer learns about it. Every security certification covers it. Every procurement checklist at every major enterprise has a checkbox that says something like "application security review completed." Lily had apparently checked that box and still shipped 22 unauthenticated endpoints out of 200.

$20 Total cost of the exploit
2 hrs Time to full read/write access
22/200 Endpoints with no auth
2+ yrs Lily in production before discovery

What Lily Was

McKinsey & Company built Lily as the firm's enterprise AI knowledge platform, a proprietary system trained on internal research, case studies, and methodologies accumulated over decades of consulting work. For a firm that sells intellectual capital, Lily wasn't just a productivity tool. It was the vault.

Approximately 70% of the firm's 40,000 consultants had access. They used it to pull relevant frameworks for engagements, surface past project learnings, draft client-facing documents, and query the accumulated knowledge of one of the densest concentrations of business intelligence on the planet. The platform had been running since at least 2023, refined, iterated, and depended upon.

What it hadn't been was properly secured at the API layer.

The Attack: Slower Than a Coffee Break

The researcher's account of the breach reads like a security textbook chapter on what not to do, except the subject is a real platform at a real company with real clients whose sensitive information was potentially exposed.

Hour 0, Reconnaissance

Basic endpoint enumeration against Lily's API surface. 200 endpoints discovered. Standard stuff, any intermediate developer could do this with freely available tooling. Cost so far: $0.

Hour 0.5, First vulnerability

22 of those 200 endpoints returned data without any authentication challenge. No token required. No session validation. Just... data. Cost: still $0.

Hour 1, SQL injection confirmed

Standard SQL injection test strings against input fields. The database responded. The payload worked. This technique has been publicly known for 28 years. Approximate cost of research: $0.

Hour 2, Full access

Read and write access to the underlying database, the one storing the intellectual capital of McKinsey's engagements, the research, the frameworks, potentially client data. Total spend to reach this point: roughly $20, accounting for the API calls used in testing. Two years of production deployment: breached.

"$20, two hours to get full read and write access to the AI platform that 70% of McKinsey's 40,000 consultants use every single day."

- Security researcher's account of the Lily breach

Why This Is Bigger Than McKinsey

McKinsey isn't unusual here. That's the problem. The firm's security failure is notable because of its scale and because of who they are, a company whose entire value proposition is sophisticated analysis and judgment. But the pattern of vulnerabilities Lily exhibited is endemic across the enterprise AI deployment wave of 2023 through 2025.

Enterprises rushed AI platforms into production under board and CEO pressure to "show AI progress." Developer teams integrated LLMs into existing systems without corresponding security reviews. Procurement checklists were written for traditional SaaS software and never updated to account for AI-specific attack surfaces: prompt injection, retrieval-augmented generation data poisoning, agentic tool misuse, and, as Lily demonstrated, the prosaic failure of just not requiring authentication on API endpoints.

⚠ The Procurement Gap

Standard enterprise procurement checklists were designed for traditional software. They check for SOC 2 compliance, penetration test reports, and data processing agreements. They don't ask: "What happens when an attacker can write to your AI's knowledge base?" or "Are your retrieval endpoints authenticated?" The Lily breach is what the gap between those two frameworks looks like in practice.

The Unauthenticated Endpoint Problem

Twenty-two unauthenticated endpoints out of 200 sounds like a development oversight, a few endpoints an engineer forgot to wire up to the auth middleware. In practice, it's a systems and culture failure. In a properly structured development environment with security reviews, mandatory authentication is a default enforced at the framework level, not something individual developers opt into per-route.

For AI platforms specifically, the stakes of unauthenticated endpoints are compounded. Traditional software endpoints might return user profile data or transaction records, sensitive, but bounded. An AI platform's retrieval endpoints potentially return synthesized intelligence drawn from the full corpus of proprietary knowledge the system was trained on. Unauthenticated access to a retrieval endpoint isn't a data leak, it's a library door left unlocked.

-- Classic SQL injection, circa 1998
-- Still working on production AI platforms in 2025

SELECT * FROM documents
WHERE title = ''; DROP TABLE sessions; --'

-- Or, more practically for data exfiltration:

GET /api/v1/knowledge/search?q='+UNION+SELECT+*+FROM+users--
-- No auth token required. 22 endpoints like this. In production.

The AI Agentic Surface Makes It Worse

Lily's vulnerability was exploited at the data layer, SQL injection against the underlying database. But the attack surface for enterprise AI platforms has expanded dramatically as agentic capabilities have been added. Modern enterprise AI tools don't just retrieve and synthesize, they take actions: sending emails, editing documents, calling APIs, scheduling meetings, submitting forms.

The implication is that the attack surface isn't just the knowledge base anymore. An unauthenticated endpoint on an agentic AI platform with write-back capabilities is a remote code execution vulnerability dressed in product clothing. An attacker who can inject into the retrieval layer doesn't just read your data, they can potentially control what the agent does next.

Security researchers have demonstrated prompt injection attacks where malicious content embedded in retrieved documents hijacks agent behavior. Combine that with the authentication failures Lily exhibited and you have a platform where an external attacker can read proprietary data and potentially influence the downstream actions the AI takes on behalf of consultants.

What Procurement Missed

The Lily breach happened despite McKinsey presumably having enterprise-grade procurement processes. The firm advises clients on risk management for a living. If Lily slipped through their review processes, it's not because McKinsey is careless, it's because the review processes weren't built for what Lily actually is.

A traditional enterprise software procurement review looks for:

  • SOC 2 Type II certification
  • Penetration test reports (typically annual)
  • Data processing agreements and GDPR/CCPA compliance
  • Single sign-on integration
  • Role-based access controls
  • Vendor security questionnaires (often 200+ questions)

None of those checklist items would have caught 22 unauthenticated API endpoints that weren't tested in the annual pentest scope. None of them ask specifically about SQL injection defenses on LLM retrieval pipelines. None of them evaluate what an AI agent can do with write access to a knowledge base it shouldn't have write access to.

The procurement checklist that approved Lily wasn't wrong. It was built for a different class of software. And the industry has been deploying a fundamentally new class of software using checklists that don't know what questions to ask.

The Two-Year Gap

Perhaps the most significant number in the Lily breach isn't $20 or 22, it's the two-plus years the platform ran in production before the vulnerabilities were discovered by an outside researcher. Enterprise AI platforms deployed in 2023 have had two years to accumulate data, expand their capabilities, integrate with more internal systems, and deepen their access to proprietary information. The attack surface has grown. The discovery mechanism was an external researcher who decided to look.

How many enterprise AI platforms deployed in the 2023-2025 wave have similar vulnerabilities that simply haven't been found yet? The honest answer is: we don't know, because most of them haven't been tested by anyone with the intent and capability to look.

What Actually Needs to Change

The Lily breach points to three specific gaps that procurement and security teams need to close before the next round of enterprise AI deployments:

1. AI-specific security reviews. Standard penetration tests need explicit scope inclusions for LLM attack surfaces: prompt injection, retrieval layer authentication, vector database access controls, and agentic tool permission boundaries. These are not covered by default in most security testing frameworks built before 2023.

2. Authentication-as-default enforcement. Development frameworks for AI platforms should make unauthenticated endpoints impossible by default, not opt-in. The 22 unauthenticated Lily endpoints represent a default-insecure development posture that framework-level enforcement would have prevented.

3. Continuous monitoring, not annual audits. A one-time penetration test conducted at deployment doesn't catch vulnerabilities introduced in subsequent updates. AI platforms are updated continuously, security review cadences need to match that velocity.

"SQL injection was first formally documented in 1998. It's been 28 years. If your AI platform is still vulnerable to it, the problem isn't the attacker's sophistication, it's the deployment process."

- Security analysis of the Lily incident

The Quiet Risk in Every AI Deployment

Lily is notable because McKinsey is notable. But the enterprise AI deployment patterns it represents, fast deployment under executive pressure, procurement processes not updated for AI-specific attack surfaces, reliance on annual security reviews rather than continuous monitoring, are standard across the industry.

Every major consultancy, law firm, financial institution, and healthcare system that deployed an enterprise AI platform in 2023 and 2024 faces some version of the same question: did the security review process that approved this platform actually understand what it was approving?

For Lily, the answer was no. The researcher who spent $20 to find out knew what questions to ask. The procurement process that approved a two-year production run didn't.

That's not primarily a McKinsey problem. It's an industry-wide gap between how fast AI is being deployed and how well the security infrastructure surrounding that deployment has kept pace. The $20 question is: how many more Lilys are running right now, untested, in environments with the same assumptions and none of the scrutiny?