The Wrong Diagnosis
When physical AI progress started disappointing investors, Fortune had a ready explanation: the data was drunk. Bad labeling companies, sloppy annotation pipelines, low-quality training sets. Fix the labels, the story went, and robots would start folding laundry by Tuesday.
The analyst who picked apart that argument has a PhD in computer science and a much less comforting explanation. The label quality story is a distraction. The real problem is that the data doesn't exist -- not in poor condition, but literally doesn't exist -- because no one has built the instrumentation to collect it at scale. Physical AI is starving, and the food hasn't been grown yet.
Why Text AI Got Lucky
To understand the gap, you have to appreciate how absurdly lucky large language models got with data. The internet turned out to be an enormous, self-organizing corpus of human thought, accumulated over decades before anyone imagined it would train AI. GPT-4, Claude, Gemini -- they all scraped something that humanity had already produced for free. The data infrastructure existed before the models did.
Physical AI has no such windfall. Robots, self-driving cars, and world models need sensor readings of the physical environment -- and those readings must be actively collected. Every useful data point requires a device in the right place at the right time, capturing force, texture, temperature, spatial relationships, and dozens of other variables that the internet has never encoded. You cannot scrape a robot's sense of touch from Reddit.
"Text AI scraped free from the internet. Physical AI needs active instrumentation. Every useful data point requires a device in the right place at the right time."
PhD computer science analysis of the physical AI bottleneckThe Sim-to-Real Gap Is Not a Software Problem
The standard industry workaround is simulation. Build a physics engine good enough, the theory goes, and you can generate infinite synthetic training data without deploying a single sensor. The problem is that physics engines are not actually good enough -- not for the tasks that matter.
Robots trained in simulation can learn acrobatic movements. Joint angles, gravity, momentum -- these translate reasonably well from virtual to physical. But grasping a coffee mug? Picking up a folded shirt? That's where simulation breaks down completely. Fabric has different friction coefficients depending on humidity, accumulated dust, and trace oils from a human hand. A simulation that doesn't model those variables at the molecular level doesn't teach grasping -- it teaches a fantasy version of grasping that fails on contact with reality.
OpenAI's Sora experiment illustrated the cost of this gap at the product level. The video generation model was reportedly burning through $1 million per day in compute and generated just $2.1 million in total lifetime revenue before being effectively shut down -- two days of operating costs consumed its entire earnings history. Disney had committed to a $1 billion partnership. The gap between what a model can do in demo conditions and what it can do reliably in real-world deployment proved commercially fatal.
Tesla's Data Flywheel -- and Why It's Hard to Replicate
The clearest example of what functional physical AI data infrastructure looks like is Tesla's full self-driving fleet. The system runs a six-step loop: the fleet collects data continuously, edge cases trigger shadow mode processing, targeted retrieval pulls the most instructive examples, human annotation adds ground truth, models retrain on the enriched set, and continuous validation closes the loop.
The numbers that result from that flywheel are difficult to internalize. Tesla has accumulated 8 billion miles of supervised FSD driving across approximately 2 million vehicles, generating 160 billion video frames daily. If a single person drove 24 hours a day at 60 miles per hour, it would take over 15,000 years to accumulate that much driving experience. The data moat is not a marketing claim -- it is a physical reality that took years and billions of dollars in deployed hardware to build.
"Reality is undefeated. Tesla has 8 billion miles of people teaching their cars what the real world looks like. Everyone else has hopes, dreams, and a really good physics engine."
PhD computer science analysis, physical AI competitive landscapeWho Controls the Sensors, Controls the Future
The competitive implications split the physical AI landscape cleanly between companies that own sensor infrastructure and those that don't.
| Company / Sector | Sensor Control | Data Advantage | Physical AI Readiness |
|---|---|---|---|
| Tesla (autonomous vehicles) | 2M+ vehicles in fleet | 8B real-world miles, 160B daily frames | High -- data flywheel operating at scale |
| Healthcare / Wearables | 300M devices by 2026 | Continuous biometrics, EHR integration | High -- domain-specific, clinical validation |
| Industrial / Manufacturing | Embedded in production lines | Predictive maintenance, process telemetry | High -- clear ROI drives sensor investment |
| OpenAI (generalist lab) | Minimal -- no fleet, no devices | Synthetic / simulation dependent | Low -- Sora shutdown illustrates gap |
| Google / Anthropic (generalist labs) | Limited direct hardware | Partnership-dependent, not proprietary | Low to medium -- robotics efforts early stage |
| Humanoid robot startups | Building now ($6B invested in 2025) | Exoskeleton workers, gig teleoperation data | Medium -- infrastructure race in progress |
How Sensors Actually Get Funded
The $6 billion invested in humanoid robots in 2025 is notable not primarily for the robots -- most of them barely walk -- but for where the money is going inside those companies. A large share is flowing directly into data collection infrastructure. In China, workers wear exoskeletons performing repetitive manipulation tasks hundreds of times a day. In Nigeria, Argentina, and India, gig workers are filming themselves doing household chores to generate training data for robotic dexterity. The industry has concluded that the only path to real-world data is to pay people to create it.
The more sustainable version of that dynamic is the one playing out in domains where sensors solve immediate, measurable problems. Smart building systems with energy-monitoring infrastructure have demonstrated 28% reductions in consumption -- one university reported $310,000 in annual utility savings and averted $280,000 in emergency HVAC repairs through predictive alerts. Industrial IoT deployments are delivering 25% reductions in maintenance costs and avoiding 70% of unplanned downtime events. Healthcare wearables are pushing toward 300 million deployed devices worldwide by next year, with Google and Fitbit already integrating device data streams directly into electronic health records.
The pattern is consistent: sensors get funded when they solve a problem with a legible price tag today. The AI training value is a secondary benefit. This is actually good news for the infrastructure buildout -- it means deployment isn't waiting on AI investment cycles.
"Scale AI can label images, but it cannot create data about how a robot gripper feels when it touches different materials. That data has to come from physical contact with the physical world."
PhD computer science analysis, physical AI data bottleneckTwo Paths Forward
The analyst sees the industry splitting into two viable trajectories. The first is domain-specific depth: companies like Tesla, medical device manufacturers, and industrial automation vendors that already have deployed sensor networks will continue building vertical moats. Their data is proprietary, their models improve with scale, and their lead widens with every mile driven and every machine monitored. Generalist labs trying to compete in these domains from outside will face an asymmetric disadvantage that no amount of compute can bridge.
The second path is infrastructure-first generalization. The IoT sensor market is on a trajectory from $18.34 billion in 2024 to $422 billion by 2034, a CAGR of roughly 37%. North America alone is projected to go from $8.52 billion in 2025 to $136 billion by 2034. As sensors become cheaper and more ubiquitous -- embedded in buildings, vehicles, clothing, and industrial equipment -- the raw data substrate for general physical AI will eventually exist. The question is whether the labs that need it will still be operating when it does.
The 21.1 billion IoT devices already deployed worldwide represent an early sketch of what that infrastructure looks like at scale -- roughly three connected sensing endpoints for every person on Earth. But most of those devices are not yet generating the richness of multimodal, grounded, physically annotated data that would make a general-purpose physical AI possible. They are thermostats and inventory tags, not robot teachers.
The companies building toward physical AI on a foundation of real sensor data are playing a different game than the ones running better simulations. The sim-to-real gap is not a calibration problem that better software will solve -- it is a data collection problem that requires hardware, deployment, and time. Reality is undefeated, and the scoreboard is measured in sensor-hours, not GPU-hours.