
Large language models (LLMs) have reached a turning point. Scale brought us here, but the next breakthrough is about cultivating judgement: the ability to discern which information matters, when it matters and how it should shape decisions in real time.ย ย ย
Weโveย been conditioned to equate bigger with better. Larger models, more parameters, expanded context windows.ย Yet,ย something fundamental is missing from this equation. A model that can access everything but prioritise nothingย hasnโtย become moreย intelligent,ย itโsย simply accumulated more potential points of failure.ย ย ย
As LLMs evolve into agentic systems capable of reasoning and autonomous action, their ability to filter signal from noise, weighย relevanceย and anchor decisions in what truly matters willย determineย how capable they are.ย ย ย
Scale aloneย isnโtย enoughย ย ย
The context windows for LLMs, how much recent text it can remember and use to shape its next response, have dramatically expanded in recent years. They have grown from a few thousand tokens to a few hundred thousand tokens, and in some cases haveย even reached a million tokens. In theory, this should allow LLMs to read and reason across entire documents, sustain longer conversations, and use information from multiple sources to produce more coherent answers.ย ย ย
However,ย Stanfordโs 2025 AI Indexย shows that the standard tests for language modelย proficiencyย amongst leading LLMs are producing near identical results despite wide differences in model size and memory. This suggests that increased scale is not enough to make a meaningful difference to LLM efficacy.ย ย
At the same time, using larger LLMs is more costly. Thisย isnโtย necessarilyย a bad thingย as bigger contexts ensure that LLMs can handle longer documents, recall past exchanges, and reason across complex information. Butย itโsย important for business ROI that the higher spend on compute is matched by better outputs.ย ย ย
Nvidia estimates that keeping a 128K token conversation (which isย roughly theย length of a short book) in an LLMโs working memoryย can consume about 40 gigabytes of graphics processing unit (GPU)ย memory. This means that one long chat can max out an entire GPU, which is very costly for potentially only marginal gains in performance.ย ย ย
More dataย doesnโtย mean better answersย ย
LLMs need the right data to produce answers that areย accurate, relevant, and useful. Today, they are being fed more information than ever in a bid to make their responses richer and more precise. This can include recent documents, data from internal knowledge bases,ย previousย chat histories, database records, and live information pulled from APIs or other connected applications.ย ย
Each of these sources adds useful information, but they also bring more complexity. The data is often scattered across different systems, updated at different speeds, and stored in different formats, so stitching it all together takes longer and more computing power. The crux of the issue however is that even with all that data, LLMsย arenโtย guaranteed to use the right information at the right time.ย ย
Stanford and Berkeleyโsย Lost in the Middleย research shows that when models are flooded with long contexts, they oftenย fail toย recall what matters most. In other words, simply giving LLMs more informationย doesnโtย help if theyย canโtย recogniseย whatโsย relevant.ย ย
For example, a customer support bot scrolling through an entire chat history instead of focusing on the last issue you raised, is slowed down by theย additionalย information andย is not able toย make a better judgement simply because it has access to more data.ย ย
The same issue can crop up in enterprise search. Ask an AI assistant for your companyโs latest travel policy, and it might pull up five versions โ including one from 2019 โ because itย canโtย judge which source is current. The answer looks comprehensive, butย itโsย not actually useful.ย ย
In short, the problemย isnโtย simply how much data an LLM can access, but how well it manages that data.ย ย ย
The role of context engineeringย ย
If more data aloneย isnโtย the answer, better context is. Context engineering is deciding what information an LLM needs, when it needs it, and where that information should come from. The aim hereย isnโtย to feed models everything, but to help them focus on the right things to produce better outputs.ย ย ย
Getting context engineering right depends on improving performance, relevance, and access. Performance is improved when LLMsย have the ability toย reuse workย theyโveย already done, so time and energyย isnโtย wasted recomputing answers. Relevance, on the other hand, is about helping LLMs narrow their field of view to the data that improves reasoning in relation to a specific task.ย ย Access is about ensuring useful data is always available,ย accurate, and secure when the model needs it. Actioned together, these three elements are what can enable LLMs to make better choices about what to use and when, transforming raw information into meaningful context.ย ย
Filtering data to deliver accuracyย ย ย
Modern data infrastructure is what makes this all possible. Real-time in-memory storage speeds retrieval so LLMs can recall useful context in milliseconds, while semantic caching avoids unnecessary compute byย identifyingย previously answered questions.ย ย ย Vector search helps surface the most relevant information from large stores of data. Together, these techniques are what give LLMs the ability to use the right context at the right moment, rather than simply remembering everything.ย ย ย ย
For example, a business using an LLM to summarise company compliance policies risks inaccurate answers if outdated or unrelated documents are merged. With context engineering, the model filters for the most recent verified documents. Real-time retrieval ensures only up-to-date information is used, making answers faster and moreย accurate. Simply put, that model is not remembering more,ย itโsย reasoning better.ย ย
AI that makes smarter decisionsย ย
The transition from static models to dynamic agents marks a fundamental shift in what we should expect from AI. Context windows will continue to expand, but scale alone has never equalled wisdom. What separates truly capable AI systems is their capacity toย identifyย which information matters and the judgement to act on it appropriately. This combination of insight and judgement will shape the next generation of AI.ย ย ย



