We speak of artificial intelligence as if it understands the world; it does not.
That distinction matters in geospatial intelligence, where seeing is not the same as understanding. Jian Yang’s binary Not Hotdog solution is real, just with national security consequences. A model may be able to tell you whether a tank is present in a frame, but that does not mean it understands why the tank matters, whether it belongs there, or what changed around it.
For decision-makers operating under stress, whether on the battlefield, during a disaster response, or in any other high-stakes environment, the value of AI is not that it produces an answer. The value is whether it provides an answer with a trustable probability of accuracy. From there, it is up to the decision maker to determine her risk tolerance for error and the immediacy of action.
To get as close as we can to this very human capability requires being honest about what AI is doing in the first place.
Humans recognize objects through experience, context, and common sense. I was recently in Kyiv speaking with an extraordinary company providing analytics for the war effort. Four years ago, these were agriculture analysts assessing crop health. Today, they can predict Russian order of battle better than most traditional analysts I have met. Why? Because they have learned the operating environment. It is their home. Many grew up with the terrain, the roads, the villages, the agricultural patterns, the weather, and the rhythms of daily life. These analysts see patterns because they understand what normal looks like. They are not just identifying objects in an image; they are interpreting what those objects mean.
A computer vision model learns statistical relationships between pixels, labels, and context in the data it has been trained on. That can be powerful, but it can also be brittle. A model trained to find tanks in Eastern Europe may struggle when the same tank appears in the Indopacific. Same object, different background. Same threat, different signature. The model did not forget what a tank is; it never knew what a tank was in the first place.
“Seeing” in geospatial intelligence cannot simply mean detecting an object in an image. A model can draw a box around a vehicle and still miss the point. For the last decade, much of geospatial AI has been built around detection: Find the ship; find the aircraft; find the vehicle; find the building. These tools are useful, and in many cases operationally valuable, but they are also narrow and cannot scale when you are pivoting between available assets in the heat of battle. Much of what is marketed as AI analytics in geospatial intelligence is still object detection with a better interface.
Most notably, this approach struggles with edge cases, which are often the cases that matter most. A model may perform well against clean examples of known objects and still fail when the object is damaged, hidden, poorly imaged, partially visible, or operating in a place where the model has not seen it before.
There are ways to improve this: more training data, synthetic data can help, better labeling, etc, but these approaches are expensive, time-consuming, and often difficult to scale.
The better path is not simply building more models to detect more things or automating narrow detection at greater speed. The better path is building systems that enrich context, force the model to “see” differently, and apply computer vision principles, old-school math, probability, and statistics to help the user understand what the model is actually saying.
The answer is not simply more AI. It is better framing around what the AI is actually doing.
If a model says it found a tank, the useful question is not only whether the box around the tank is correct. The useful question is what else would make that answer more believable. Answering complex questions of context requires treating each model as one input in a larger judgment. That is how people make decisions anyway. We rarely act on one piece of information in isolation; we look for context, corroboration, timing, and risk.
This is where geospatial AI should be heading. Not just toward finding more objects faster, but toward helping people understand what is worth paying attention to and what is noise. That is the difference between detection and discovery. Detection answers the question you already asked. Discovery helps surface the thing you did not know to look for.
A single data point can be useful, but it can also be dangerous if it is wrong. No single observation should carry more weight than it deserves, and that weight changes based on weather, image quality, sensor stability, time of capture, and corroboration from other sources. When multiple sources corroborate a pattern, the answer becomes more useful, providing the decision-maker a stronger basis for judgment.
You don’t have to be a veteran to understand the importance of timely and confidence- annotated accuracy. To end with cinema as we began, A House of Dynamite captures this better than any policy paper: A report comes in that cannot be fully verified. The clock is moving. The decision-maker does not have the time, context, or luxury of waiting for a perfect answer. She has to make a risk calculation with the best information available. That is geospatial AI at its best: not making the decision, not pretending the model understands the world, but turning incomplete information into a clearer risk calculation and leaving the hard call to the human.
Kate van Dam is Head of Government at SkyFi, where she leads strategy and partnerships across defense, intelligence, and government sectors. Kate is available for interviews and further commentary. For press inquiries, please reach out to [email protected]

