Future of AIAI & Technology

AI Can’t See, Yet. Why Visual Intelligence Will Power the Next Wave of Agentic Systems

By Jeevan Kalanithi, CEO and co-founder, OpenSpace, the Visual Intelligence Platform for builders

AI has made extraordinary progress in understanding and generating information. Today’s systems can analyze documents, generate code, and reason across massive datasets with accuracy. But there is still a fundamental limitation that often gets overlooked. Most AI cannot understand what is happening in the physical world. 

That gap matters more than ever as we enter the era of agentic AI. 

AI Understands Documents. The World Runs on Reality. 

Most AI systems today are built to process text. They operate on documents, logs, schedules, and structured datasets. That is where they perform best. But in real-world industries, documents describe intent. They describe what should happen.  

They do not tell you what actually happened. 

That distinction is critical in industries like construction, where every decision ultimately comes down to what got built, what condition it is in, and whether it aligns with the plan set in place. For AI to be useful in these environments – especially as systems begin to act –  it needs access to that ground truth. 

Agents Need Data They Can’t Generate 

This limitation becomes even more important as a new generation of AI systems emerges. These are agentic systems that can reason across data sources, plan multi-step workflows, and take action. But there is a constraint that is often overlooked. Agents are only as good as the data they can access.  

Today, much of the data that AI relies on is text-based. And text-based workflows are increasingly easy to replicate. Feed a structured dataset into a capable model, and it can generate summaries, recommendations, and even replicate parts of existing software. 

As models improve, analytics built on structured data becomes easier to replicate. The differentiation shifts away from analysis and toward access to unique data. What cannot be commoditized is data that is difficult to capture and essential for reasoning about physical reality. 

From Documents to Reality Data 

This is where visual intelligence comes in. Visual intelligence is the ability to capture and structure visual and spatial data from the physical world in a way that machines can understand. In construction, that includes spatially indexed imagery, drone data, and continuously updated visual records of the jobsite. This is reality data. 

Reality data reflects what is happening in the field, not what is inferred or reported after the fact. It is captured directly from the environment and organized in a way that AI systems can query and reason over. 

At scale, this changes how work is understood. Instead of relying on fragmented documentation, teams can access a continuously updated record of progress, quality, and risk tied to specific locations and points in time. 

Why This Matters for Agentic AI 

As agentic AI systems take on more responsibility, their ability to operate in the real world becomes essential. An AI agent can recommend what should happen next and understand what is intended based on plans and data. But if it cannot verify what has happened, its usefulness is limited.  

That is the gap visual intelligence fills. It allows AI systems to move from reasoning about intent to grounding decisions in reality. From analyzing documents to validating execution. This means agents can begin to support workflows that depend on real-world conditions, not just reported information. 

A New Layer in the Stack 

In industries like construction, we are moving beyond systems built around documents and intent toward systems grounded in what is really happening in the field. Plans and reports describe what should happen. Reality data describes what is happening. 

That data forms a new layer in the technology stack, one that AI systems depend on to verify work, track progress, and ground decisions in evidence. A layer that connects digital workflows to physical execution. As agentic systems evolve, this layer becomes essential infrastructure. 

The Future of AI Needs Eyes 

The next phase of AI will not be defined only by better models, but by how those models interact with the real world. Reasoning alone is not enough – AI systems need access to accurate, continuously updated information about physical conditions. They need to be able to “see”. 

In construction and other real-world industries, that capability will define what AI can really do. Agents will not replace human judgment, but they can make it better informed and faster to act on. The systems that provide that visibility will play a central role in the AI stack.  

Because in the end, every workflow, every decision, and every outcome depends on the same question: What happened? And in the agentic AI era, the platforms that can answer that question will be the ones that matter most.  

Jeevan Kalanithi is Co-founder and Chief Executive Officer of OpenSpace, the Visual Intelligence Platform for builders. More than 380,000 users in 130 countries rely on OpenSpace to capture the reality of their sites with smartphones, 360° cameras, and drones—and turn that reality into real-time, actionable intelligence and simpler workflows. Prior to starting OpenSpace, Jeevan served as Entrepreneur-in-Residence at Lux Capital. He sold his first company, Sifteo, to 3D Robotics, where he eventually became the company’s President. Jeevan holds a BS in Symbolic Systems from Stanford and an MS from MIT, where he was a National Science Foundation Graduate Fellow. 

Author

Related Articles

Back to top button