Three Problems, None of Them Software
The story of AI infrastructure in 2025 and 2026 is not a story about models or benchmarks or capability curves. It is a story about cooling systems, electrical permits, and packaging technology that only one company on earth can produce at scale.
The build is breaking in three places simultaneously. Cooling. Power. Supply chain. Each one is independently limiting. Together, they create a physical ceiling on how fast AI capacity can actually grow, regardless of how much capital is pointed at the problem.
This matters because the model capability curve and the infrastructure curve are now running at different speeds. That gap has a name. It has consequences.
The Cooling Problem
Modern AI accelerators generate heat at densities that standard air cooling cannot handle. This is not a minor engineering wrinkle. It is a fundamental constraint on where you can put these machines.
Liquid cooling is the solution, but liquid cooling requires infrastructure that most existing data centre buildings were not designed to support. Retrofitting means structural modifications, water supply systems, drainage planning, humidity control, and in many cases, rebuilding the floor itself. Several data centre projects across Europe and the United States have been paused or cancelled entirely after site assessments revealed that the existing building stock could not support the cooling load required.
New builds avoid the retrofit problem but face a different one: construction timelines. A purpose-built, liquid-cooled AI data centre at meaningful scale takes 18 to 36 months to complete from the first shovel in the ground. The demand for compute does not wait 36 months. The result is a persistent gap between where the capacity is needed and when it can actually be delivered.
The Power Problem
Microsoft, Google, and Amazon have all said the same thing, in different ways, in recent quarters: power availability is now their primary constraint on expansion. Not capital. Not land. Not permitting for the buildings themselves. Power.
AI data centres don't plug into the grid the way an office building does. They require dedicated substations. A single large-scale AI campus can draw as much power as a small city. Permitting a new substation in the United States takes, depending on the jurisdiction and the interconnection queue, somewhere between three and seven years.
That timeline is not negotiable. It is not a process that capital can accelerate meaningfully. It involves grid operators, state utility commissions, environmental review, and physical construction of transmission infrastructure. The money exists to build faster. The regulatory and physical process does not move faster because money is available.
This is why several major tech companies are pursuing small modular reactor partnerships. The Microsoft deal with Constellation Energy to restart Three Mile Island is the most visible example, but it is not unique. The logic is straightforward: if you cannot get power from the grid fast enough, you build your own power source. SMRs offer the promise of dedicated, on-site generation that bypasses the grid interconnection queue entirely. They also take a decade to permit and build. The power problem does not have a short-term solution.
The Supply Chain Problem
H100 and H200 GPUs, the hardware that most frontier AI training runs on, require a packaging technology called CoWoS. Chip-on-Wafer-on-Substrate is a specialized process for stacking high-bandwidth memory onto the compute die in a way that allows the data throughput modern AI workloads demand.
TSMC is essentially the only company that can produce CoWoS packaging at meaningful scale. Their CoWoS capacity is running at close to 100% utilization. This is a physical chokepoint. It means that GPU supply cannot grow faster than TSMC's ability to expand CoWoS capacity, independent of how much Nvidia wants to ship, independent of how much the hyperscalers want to buy.
Expanding CoWoS capacity requires building new fab capacity, which takes roughly two years and costs billions. TSMC is doing this. It is not a surprise. But the expansion timeline means there is a hard ceiling on GPU supply growth that will not move regardless of demand pressure. The constraint is the machine that makes the part, not the desire to make more parts.
The downstream effect is a distortion in the market. Companies that secured GPU allocations early are sitting on significant inventory advantages over competitors who are waiting in the queue. The ability to train frontier models is, for now, substantially determined by when you got in line at TSMC, not by how much you're willing to spend today.
Where the Three Problems Intersect
The cooling, power, and supply chain problems do not fail independently. They are linked in ways that compound the constraint.
Liquid cooling requires water. Large-scale water consumption for data centres in regions already under water stress creates permitting and environmental review delays that stack on top of the construction delays. Some of the most attractive regions for data centre development, low-cost land, favorable climate, proximity to fiber infrastructure, are also regions where water availability is increasingly regulated. The cooling solution depends on a resource that is itself becoming a bottleneck.
Power and supply chain are linked through timing. The GPU supply constraint means companies are competing intensely for whatever H100 and H200 allocation TSMC can produce. The companies that win those allocations want to deploy the hardware immediately to start generating returns. But the power infrastructure to run that hardware at scale is not ready. The result is warehoused GPUs waiting for substations to come online, capital sitting idle while regulatory processes complete. The supply chain problem and the power problem are out of sync with each other, and the mismatch is expensive.
The most constrained scenario is a company that managed to secure GPU allocation ahead of schedule but cannot run the hardware because the data centre is still waiting for grid connection. This is not hypothetical. It has happened to multiple hyperscalers. The hardware exists. The power does not. The investment is made but the return is deferred.
The Stranded Asset Risk
Data centre leases are being signed on 10 and 20-year terms. The financial logic requires it: the capital expenditure only pencils at that scale if the revenue assumptions hold for a decade or more.
The revenue assumptions are built on AI adoption projections. Specifically, on the idea that enterprise AI adoption will continue at or near current rates, compounding into a demand curve that justifies the build. If those projections are wrong by a meaningful margin, the leases become stranded assets. Expensive square footage drawing expensive power to run hardware for workloads that don't materialize at the projected scale.
Microsoft said something notable in its Q3 2026 earnings call. They are "pacing" data centre investment and will slow the build if demand signals soften. "Pacing" is careful language. It signals that the company is watching the demand data closely enough that it has already built a contingency into its planning. That is not the language of a company that is confident the projections will hold. It is the language of a company that knows what a stranded asset looks like and is trying to avoid building one.
The Compute Overhang
Put the three problems together and a specific scenario comes into focus. Model capability is improving faster than the infrastructure required to run those models at scale can be built. Cooling limits where the hardware can go. Power limits how much hardware can run. Supply chain limits how fast new hardware can be produced. The result is a widening gap between what the models can do and what can actually be deployed at meaningful scale.
This is called a compute overhang. Capability exists that cannot be fully expressed because the physical substrate needed to run it is not available. In the short term, this mostly affects who gets access and at what price: frontier model inference becomes a constrained resource, allocation decisions get made by the companies that own the infrastructure, and the gap between what the best models can do and what most users can actually access widens.
In the longer term, the overhang has more interesting effects. It creates pressure to make models more efficient, to do more with less compute, to find architectural improvements that reduce the physical footprint of frontier inference. Some of the most interesting work in AI research right now is driven not by the desire to push capability further but by the practical need to fit existing capability into infrastructure that can actually be built.
The ceiling is physical. The timeline is measured in years, not quarters.