The Glue in the Pizza
May 2024. Google launches AI Overview , AI-generated answers pulled directly into search results. Someone asks how to make cheese stick to pizza better. The AI answer: add Elmer's glue to the sauce.
Not a malicious suggestion. Not a system error. The model had scraped Reddit, encountered a joke about gluing cheese to pizza, processed it as real information, and served it to millions of users with complete confidence. No sanity check. No flag. Just an answer that looked like knowledge and was completely wrong.
This was not an isolated incident. It was a symptom of something researchers had been warning about for years, and that the companies building AI had quietly decided to ignore.
The Habsburg Problem
Researchers studying model degradation gave it a name: the Habsburg AI problem.
The Habsburg dynasty solved the problem of keeping power concentrated in the family by marrying within the family for generations. The strategy worked for maintaining control. The result was a jaw so deformed the last Habsburg king could not chew his own food. The gene pool had folded in on itself.
AI models are doing the same thing with information.
The first generation of large language models trained on human-generated content , decades of Wikipedia, books, forums, news, arguments, corrections, and bad takes from real people who had real experiences. The quality was uneven but the signal was genuine.
Then AI-generated content started appearing on the internet. And the next generation of models trained on that content , which included the outputs, errors, and hallucinations of the previous generation. Each version learns from a version that already had things slightly wrong. Each generation gets more confident. Each generation drifts a little further from reality.
After nine generations of self-training in research tests, the outputs became incoherent. Repetitive loops. Hallucinated nonsense delivered as fact. The Habsburg jaw in text form.
The Photocopy Problem
There is a simpler analogy. Print a document. Scan it. Print the scan. Scan that. After five rounds, the text is fuzzy and the image is degraded. Each generation copies imperfections from the one before, and adds new ones.
The difference: a degraded photocopy looks degraded. You can see it is a copy of a copy. AI outputs do not look degraded. They look polished. Confident. Well-structured. The model does not know it is at nine generations removed from original human thinking. It produces the answer in the same tone regardless of whether the answer is correct.
That is the specific danger. Not that AI is getting things wrong , it has always gotten things wrong. The danger is that the confidence level does not correlate with the accuracy level, and the accuracy may be quietly declining as the training data becomes more and more contaminated with AI-generated outputs.
Why the Companies Aren't Talking About It
The business model of AI companies depends on users believing the outputs are reliable. Every public conversation about model collapse or training data contamination creates doubt about that reliability , doubt that has a direct impact on revenue and valuations.
The researchers publishing on this topic are not at the companies. They are at universities and independent labs. The companies have internal research on the problem. None of it has been published.
The Google glue incident got corrected. The specific answer was removed. The underlying mechanism that produced it , a model that cannot distinguish a Reddit joke from factual information, trained on outputs from models with the same limitation , has not been fixed. The next version of the problem will look slightly different and land in a slightly different context. But it will come.