There is a foundational principle in autonomous vehicle development: you do not trust a system until you have tested it against a simulated world at scale.

The reason is obvious. Real-world driving presents a practically infinite set of scenarios. Ice patches. Children darting between parked cars. A mattress falling off a truck on the highway. You cannot test against all of them in production, because in production, mistakes cost lives. So the industry built simulation infrastructure at a scale that rivals the physical testing programs of entire automotive OEMs. Synthetic data. Digital environments. Millions of edge cases, generated and exhausted before a single wheel touches real asphalt.

The result is that simulation is not a nice-to-have in autonomous vehicle development. It is the foundation. Without it, the autonomy is not possible.

Now look at what is happening in software engineering.

Generative AI has turned code production from a bottleneck into a firehose. Teams that once shipped a handful of pull requests a week are now processing dozens in a single day. Features that used to take sprints are being drafted in hours. The output is extraordinary. However, the review process has not kept pace.

AI-generated code produces roughly 1.7 times more errors than human-written code, and the engineers who are supposed to be building new things are instead spending their days verifying AI output, debugging edge cases, and re-prompting their way through a backlog that never seems to shrink.

We have solved the generation side of software engineering. But in doing so, we have made the verification side significantly harder.

The reason is structural. A coding agent’s world is code. But software in production does not live in a world of code alone. It lives in an environment: database states, third-party API behaviors, configuration files, feature flags, permission systems, caching layers, memory limits, and the behavioral patterns of real users doing unexpected things. The interaction between these systems is what determines whether software actually works. And no model, however capable, can see into that interaction by reading code alone.

I call this the Context Void: the gap between what an AI agent can observe and what actually governs the runtime behavior of a system.

The Context Void is not theoretical. In July 2024, CrowdStrike shipped a sensor configuration update that crashed 8.5 million Windows machines simultaneously. The template was valid. The sensor code was valid. But the interaction between the number of fields in the template and the number of inputs the sensor actually consumed only existed at runtime, across the full system. That interaction blue-screened hospitals and grounded airlines. Estimated damage: over ten billion dollars.

The Cloudflare outage a few months later followed the same pattern. A database permissions change caused a query to return duplicate rows. The duplicate rows inflated a configuration file past a hardcoded memory limit in a proxy service. X, Spotify, Uber, ChatGPT, all dark. Three separate systems, each individually correct, lethal in combination.

Every individual component passed its own tests. The failure lived in the space between them. That space is the Context Void.

This is the problem autonomous vehicle development solved. The self-driving car does not get smarter in isolation and then hope that the real world cooperates. It gets a world model: a simulation of the environment it will actually operate in, comprehensive enough that running the system through it is meaningfully close to running it on the road. Weather. Pedestrian behavior. Sensor noise. Edge cases that might occur once in a billion miles of driving.

The world model is what makes autonomy possible. Not because the model eliminates uncertainty, but because it closes the loop. The system runs, something breaks, the model tells the system what broke and why, the system adjusts, and the cycle repeats until performance meets the bar required for deployment. Sense. Model. Plan. Act.

Software engineering needs the same infrastructure. Not better models. Not faster generation. A world model: a simulation of the digital environment that code actually runs in, comprehensive enough that testing against it is a meaningful proxy for production.

Instead of physics, traffic, and weather, we model databases, APIs, configurations, permissions, and user behavior. Instead of crashing a car into simulated pedestrians, we crash code into simulated production state. We observe what breaks. We tell the agent why. The loop closes.

I often think about what happens while a compiler turns source code into machine code. Nobody reads the output. Nobody audits the binary to make sure the compiler got the register allocation right. The compiler has earned that trust, and as a result, source code is the only artifact anyone cares about. The intent of the human is the thing that matters.

Coding agents have not earned that trust yet. When an agent generates source code from a natural language prompt, we read every line. We check the queries. We look for security issues. We do this because the translation from intent to implementation is not reliable enough to skip that step.

A world model changes the equation. If I can simulate what that code does in a realistic production environment, with realistic data and realistic edge cases, and verify with high confidence that the behavior is correct, the code starts to look like machine code: an intermediate artifact, technically present, but not the thing that matters. The prompt becomes the source of truth.

That is the trajectory. Autonomous vehicle development did not become possible by making the driving model smarter. It became possible when engineers built an environment sophisticated enough to develop, train, and verify the driving model inside.

The same principle applies here. The coding agent gets a world to practice in. The loop closes. The autonomy becomes real.

There is no AI coding revolution without AI quality. The industry has invested extraordinary effort in making generation fast and fluent. The missing infrastructure is on the verification side: something capable of simulating what code actually does when it hits production, at the fidelity required for autonomous software engineering.

We are building it. And the precedent for why it is necessary already exists. It is on the road.

Author

AIJ Thought Leader

View all posts

AIJ Thought Leader 35 minutes ago

4 minutes read

What Self-Driving Cars Know That Software Engineering Doesn’t

By Gal Vered, CEO and Co-founder, Checksum

Author

Author

Related Articles

Artificial Intelligence Traffic Watch Alarm Systems for Reckless Behavior: A Technology-Driven Approach to Road Safety

Beyond Single-Metric Optimization: The Trade-Off Map as a Missing Discipline in Production AI

HSCALE Closes Second Hyperscale Data Centre Campus in Milan, Committing Over €2 Billion to the Region

When AI shapes discovery, reliability becomes retail’s competitive edge