
If your robotics data was collected by unreliable sensors, your model didn’t learn reality, it learned your hardware’s mistakes
The Physical AI conversation has a blind spot – we obsess over model architectures, training compute, and benchmark scores. But every model that interacts with the physical world (every autonomous vehicle, every robotic manipulation system, every predictive maintenance engine) is ultimately only as good as the data it was trained on. And the data it was trained on comes from sensors. One funny anecdote is that every sensor is just a temperature sensor that also senses something else.
This is the unsexy, unglamorous crisis hiding beneath the surface of physical AI: if you cannot trust your ground truth, you cannot trust your model.
For years, I’ve worked at the intersection of hardware and machine learning systems. The lesson I keep relearning is the same: the gap between a sensor reading and physical reality is where AI systems quietly fail. The trickiest ones are in things that aren’t even observable to your sensors, like structurals flexing. It is not normally a dramatic crash, but a slow, invisible accumulation of bias that only reveals itself during eval – at which point its very difficult to debug.
The Calibration Problem Nobody Wants to Own
Consider a seemingly simple task: collecting training data for a robotic arm that needs to pick objects off a conveyor. You mount a stereo camera pair above the workspace. You label thousands of images with ground truth positions. You train the model. It works great in the lab.
Then you deploy it, and the arm misses by eight millimeters, consistently, in one direction.
What happened? Nobody checked the extrinsic calibration of those cameras after the rig was bumped during installation. The stereo baseline was off by a fraction of a degree. Every single ground truth label in the training set carried that bias. The model didn’t learn where objects are. It learned where a slightly misaligned camera pair thought objects were. And it learned that lesson perfectly. Since it was only trained on images from those nearly identical images, any small change puts it completely out of distribution.
This is the insidious nature of sensor error in training data: the model has no way to know the labels are wrong. It will fit them faithfully. Your loss curve will look beautiful. Your validation accuracy will be high. And your system will be confidently, systematically incorrect in production.
Camera calibration (just one example of a sensor) has both intrinsic parameters like focal length and lens distortion, and extrinsic parameters like position and orientation relative to the workspace. These can even change over time, or due to impacts like drop! The same applies to every sensor modality in the pipeline: IMUs, LiDAR, force/torque sensors, encoders. Each one is a potential source of silent corruption.
The Drift Problem
Even a perfectly calibrated system will eventually drift away from its perfect state. This is the reality that hardware engineers understand intuitively but that data scientists often overlook: physical systems change over time.
One surprising thing to learn is that plastics just change under loading. Over time, they slowly change shape (often called creep) and this is unavoidable! Even if it isn’t plastic, temperature cycles cause all materials to expand and contract, shifting mounting positions by micrometers that accumulate into millimeters. Vibration loosens fasteners. Lens coatings degrade under UV exposure. MEMS accelerometers exhibit long-term bias instability. Potentiometers wear. Elastomers in damping mounts creep.
If you are collecting data continuously over weeks or months to build a training corpus—and many serious robotics efforts do exactly this—then the data collected in month six may have subtly different characteristics than the data from month one. Not because the world changed, but because your measurement of it did. You’ve introduced a non-stationary noise source into your dataset, and unless you’re explicitly tracking and compensating for it, your model is learning the drift as if it were a signal.
Mounting Matters
There’s a tendency in prototype-stage development to treat sensor mounting as a mechanical afterthought, something you bolt on and forget about. This is because sensors can often be zeroed, or calibrated out. But this is a mistake that compounds through the entire data pipeline – even if units are slightly different but accounted for, due to things like drift they may even change in different ways.
For example with inertial measurement units, the problem is very acute. An IMU that is mounted even a few millimeters off the center of rotation it’s meant to measure will couple centripetal acceleration into channels that are supposed to be reading linear acceleration. The sensor is working perfectly. The mounting made the data wrong. Unless your quality checks were very verbose, it could’ve gotten through.
Rigid, thermally stable, inspectable mounting with documented and repeatable installation procedures is necessary. It’s the minimum bar for producing data that you can actually trust for training.
Building a Chain of Trust
The way to think about this is as a chain of trust that runs from the physical world, through your sensors, through your labeling pipeline, and into your model weights. It’s hard to debug this chain, hard to split it and examine if things are right or not – plus there’s so many links.
Every link in that chain needs to be verified, documented, and maintained. It means version controlling your calibration parameters alongside your code. It means time stamping calibration events and recording as much as possible. It means having the courage to throw away data that was collected during a period when a sensor was out of spec, even if it slows down progress or wastes time.
It also means designing your data collection systems for inspectability. Can you verify, after the fact, that the data you collected is geometrically consistent? Can you run closure checks on your sensor pipeline? Can you detect when a sensor is bad? If the answer is no, then you are in for a ride.
The Culture Shift
One of the hardest parts of this isn’t technical, it’s cultural. In many AI-focused organizations, the hardware and data collection systems are treated as infrastructure, necessary but unexciting plumbing that exists to feed the models. At Sunday we think about this the other way, our robots in the early days got no attention and almost all effort went into data collection. Henry on our Software Team is even porting software we built for our Skill Capture Gloves to our robot. Not the other way around.
An engineer who catches a calibration drift before it corrupts a training run has saved more model performance than any amount of hyperparameter tuning will recover after the fact. This is why the organizations that will build physical AI systems that actually work in the real world won’t be the ones with the biggest models. They’ll be the ones with the most disciplined infrastructure. The model can only learn what you show it, so make sure what you’re showing it is real.


