Health AI is everywhere right now. Foundation models trained on patient records, agent-like systems that spit out care suggestions, deep learning pipelines that catch disease earlier than a human might, none of that is hypothetical anymore. The architecture exists. The money is there. Real deployments are already happening. And inside controlled environments, the results can look almost unreal.
Then you move those same systems into day-to-day life. Suddenly, the outputs are confident and smooth, and they’re also often wrong, more often than most people expected.
It’s tempting to blame the model. Most of the time, that isn’t the core issue. The problem is the data, or more specifically, the kind of data we keep asking these models to learn from.
The bottleneck nobody is naming clearly
What health AI usually gets fed is a snapshot. A clinic visit. A lab panel. An app check-in. Each one is a single moment frozen in time. Even wearables, which can record continuously, don’t always deliver continuous reality, because people don’t wear them consistently.
So, in practice, the model often looks at a string of disconnected windows rather than a coherent timeline. Stanford HAI has a helpful label for this, a “missing context problem,” where the model can’t reconstruct the actual health trajectory of the person it’s supposed to help.
A snapshot-based system can be statistically excellent at predicting an outcome from whatever it was shown and still be consistently wrong about the individual. People don’t live as a single data point. Health unfolds across weeks, months, years.
Making the model bigger won’t solve this. A stronger foundation model can’t pull information out of thin air. If the right data never got captured, the model can’t recover it.
Why fragmented data breaks personalization
Personalization is the headline promise in health AI right now, but it’s also where weak data shows up fastest. If a product calls something “personalized” because it used a one-time questionnaire, a single night of self-reported sleep, or a patchy two-week wearable segment, what it’s really doing is tailoring advice to a hypothetical version of the user, someone who may or may not match the real person.
Prediction has the same issue. A risk score built from scattered inputs ends up behaving like a population average dressed up as an individual insight. The day-to-day variation the model never observed is usually the very thing that determines whether the prediction actually holds.
Clinicians notice that mismatch, and consumers do, too. It helps explain why so many health AI products can post strong lab numbers and still struggle once they meet real-world behavior.
What health AI needs is a different data foundation that continuously tracks behavior, physiology, and context over long stretches of ordinary life. To use an analogy, most health data today is a bit like a still photo. Longitudinal data is the film.
Sleep as a core signal worth building on
Sleep makes this easy to see. Every adult produces hours of sleep data each day. It is one of the few things we humans do virtually every day (how many days did you meditate, exercise, or even watch a movie last week?). It’s information-rich, tied to behavior and biology, and connected to a long list of chronic conditions. But there’s a catch: you can’t interpret it correctly without measuring it over time.
Our team at Sleep.ai recently co-authored a study with collaborators from Washington State University’s Sleep and Performance Research Center and the University of Washington, which was published in JMIR Formative Research. We tracked 56 consecutive nights of objective, at-home sleep across 112 adults, the longest objective profile of chronic insomnia ever conducted.
Even we didn’t expect the central result: on average, people with chronic insomnia sleep about as much as everyone else. The difference is variability from night to night, big swings in sleep onset, awakenings, and perceived restfulness. That pattern isn’t visible in a single night. You only see it after weeks of continuous measurement. If you built an AI tool around a one-night snapshot, you’d miss the defining feature of the condition.
What builders should do about this
The next phase of health AI will come down to data depth. The teams that build continuous, validated longitudinal pipelines around domain signals such as sleep, glucose, heart rhythm, or movement will hold up in real-world settings. The teams that try to muscle through the data gap will keep producing impressive demos and promises, only to deliver underwhelming performance once the product hits reality.
Data acquisition can’t stay the “boring” part of the stack. In practice, the data layer is the model. Choose the right foundational signals. Measure them for as long as it takes for their real patterns to show up. Validate them against gold-standard references.
A lot of health AI is being rushed onto infrastructure that was never designed to support what these systems are being asked to do. The smaller group that fixes the foundation first won’t look as exciting at the start. Then, eventually, they’ll look impossible to catch.



