What gap lies between mimicry and understanding? Expertise.
As the technology industry drives towards the promise of artificial general intelligence (AGI), the majority of founders and their organizations are making a key mistake. They tend to think that AGI will come as an all-knowing, all-doing LLM that is an expert across the full scope of decisions in their company. The mistake here is the tendency to think of intelligence as an all-encompassing concept: either you have intelligence and are therefore good at everything, or you don’t and are therefore bad at everything. But human intelligence is far from all-encompassing.
Human intelligence differentiates along lines of expertise. Intelligence is better modelled as a collection of specific expertise than as a “general intelligence.” While LLMs certainly appear to know a lot of information, not all knowledge is created equal.
LLMs have a lot of declarative knowledge — the puzzle pieces of facts — but they struggle with procedural, causal, contextual, and intuitive knowledge that humans rely on. LLMs entirely lack intuitive, experience-based insights that transcend reciting facts and that humans can more readily possess. Ongoing research debates whether LLMs form implicit world models, but the evidence so far is pretty weak. Without a robust world model, LLMs cannot reason as humans do; there is simply no stable model of the world to reason about. Without the ability to reason in this sense, the LLM will not be able to deliver expert intelligence. It is that world-model that makes experts experts.
In a recent paper, Harvard and MIT researchers shared that LLMs fail to reason effectively in complex scenarios — those in which successful inference requires recognizing and reasoning multiple interrelated concepts within a single question and interpreting similar, but semantically different, entities. I don’t know of many fields of expertise where these properties are not common.
LLMs excel at synthesizing ideas, but they sometimes make impossible or unhelpful combinations, leading to ambiguity, categorical errors, broken inferences, and incoherent reasoning chains. The answers may be confident, but they fall apart under scrutiny — the opposite of what we would expect from a true expert.
For AI to truly realize its potential and supplant human experts, it needs to go beyond mimicking expertise to truly embody it. To cross that gap, AI needs more than more data. It needs a world model.
Expert knowledge requires a world model
Defining what we mean and how we mean it is what makes reasoning possible. Humans do this instinctively. We structure knowledge into concepts, hierarchies, and relationships. In AI, we replicate this by codifying expert knowledge through ontologies and knowledge graphs.
Today, the AI industry is trying to solve reasoning through retrieval — namely the retrieval augmentation-generated (RAG) approach. Enterprises choose an LLM, then combine it with their company’s data. The assumption is that if users can simply get the right pieces of hay from the haystack, the model will know what to do. This assumption has led to a wave of work around prompt engineering and other tricks to help the LLM order and navigate the unstructured data. Engineers call this activity “context engineering.”
But the problem remains: making sure the LLM has the right data in context does not give the LLM expertise. The context needs to also give that data meaning. It needs to define what entities, relationships, concepts exist in the world and how they interact.
Instead of reasoning via retrieval, expertise requires more from context engineering — writing, selecting, compressing, and isolating knowledge so the context window contains exactly what the LLM needs is table stakes. It needs to also include a world model to give all that data meaning.
Instead of working with arbitrary “chunks” of facts shared with the LLM, engineers need to define a world model built on concepts. Chunks simply store facts. Concepts give those facts meaning. They signify what things matter, how inputs are related, and what the rules of the LLM’s “world” are. This anchors outputs to verifiable entities and relationships, so generated answers don’t just sound pretty — they respect the logic of the domain.
Trust is key. Expertise isn’t only about providing the right answer; it’s also about showing why an answer makes sense. Think of the difference between a doctor that focuses on retrieval and a doctor that focuses on context. Dr. Retrieval has read every medical journal but keeps it all jumbled in a mental pocket. Dr. Context has this same knowledge but structures it in order to record every symptom, map each one to possible causes and treatments, and adjust the likelihoods as new information comes in. Dr. Retrieval might guess correctly (or incorrectly) — but you’ll never know why. Dr. Context shows her reasoning transparently in a way that is consistent, humans can engage with, and ultimately come to trust. In other words, you can compare the way you see the world with the way the AI sees the world and make sense of the conclusions.
AI agents need that level of accountability. They shouldn’t demand trust — they should earn it.
Memory will unlock agentic AI reasoning
Too many enterprises are waiting for a universal model to achieve artificial general intelligence, as if OpenAI or Anthropic are focused enough on the nuances of their particular business to deliver the expertise required. The truth is that organizations don’t need universal intelligence — they need agents that can exceptionally rationalize within a well defined scope. This is what humans do. Businesses need agents that are more like human experts, not general amateurs.
A group at MIT recently showed that applying test-time training — adapting an already-trained model with task-specific examples at deployment — yielded over a sixfold accuracy boost on complex reasoning tasks like the ARC dataset. This demonstrates that pretraining alone isn’t enough: the real world constantly presents cases outside the training set. By reframing examples and updating themselves briefly at test time, models can “experiment” to fit novel tasks. But this is really just a brute-force way of faking a world model. Instead of relearning the approximate entities and relationships to permeate from scratch in every new task, imagine if the AI carried a durable map of well-defined entities and relationships — a true, persistent world model. Then it wouldn’t need brute force; it could reason.
Organizations should focus on developing memory functions aligned with this world model that can provide AI agents with the grounded context they need to deliver expert performance. How does AI reconcile what it hears in yesterday’s strategy meeting, today’s all-hands, and the five-year plan?
Humans use different kinds of memory, and so should AI:
-
Short-term memory is like a workspace. It lets the system juggle what’s immediately in front of it without confusion.
-
Episodic memory records structured, time-stamped experiences, preserving how things unfolded.
-
Semantic memory stores stable, timeless facts. This is the bedrock that doesn’t change with context.
-
Procedural memory captures steps and workflows, ensuring consistency and repeatability.
Memory isn’t simply storage, but also confidence and believability. In human reasoning, people naturally weigh the credibility of sources — a CEO’s statement is not the same as an informal Slack comment. Humans look for multiplicity of evidence: the more independent signals seen, the more confident people become. People recognize recency, prioritizing fresh updates unless something is explicitly evergreen.
AI agents should be built to replicate these habits. Like humans, AI agents are overwhelmed by having to constantly consume information from meetings, documents, and conversations. Without structure, the knowledge base clutters with inaccuracies, and diagnosing what went wrong becomes guesswork. Memory isn’t just about holding onto data, but about curating, judging, and sometimes discarding it, so reasoning remains reliable.
At Vivun, where we’re building an AI agent focused on automating sales processes, our team has a concept of “tombstoning” — a structured way to forget knowledge when it’s proven wrong or outdated. If AI never forgets, misinformation can fester.
The future of AI lies in world-model-aware context
The industry’s obsession with scale — bigger models, bigger datasets — distracts from the harder, more rewarding challenge: engineering context. AI doesn’t need more information. It needs to understand what information means and why it matters.
For all but the very largest hyperscalers, AGI doesn’t help reach real business goals. The vast majority of businesses should focus on building agents with scope, memory, and transparent world models. That’s what turns LLMs from eloquent mimics into systems that can be trusted in high-stakes environments.
Intelligence isn’t about stacking facts. It’s about assembling purpose — and purpose requires context.



