AI & Technology

Why LLMs require judgement as much as context

By Manvinder Singh, VP of Product Management for AI, Redis

Large language models (LLMs) have reached a turning point. Scale brought us here, but the next breakthrough is about cultivating judgement: the ability to discern which information matters, when it matters and how it should shape decisions in real time.ย ย ย 

Weโ€™veย been conditioned to equate bigger with better. Larger models, more parameters, expanded context windows.ย Yet,ย something fundamental is missing from this equation. A model that can access everything but prioritise nothingย hasnโ€™tย become moreย intelligent,ย itโ€™sย simply accumulated more potential points of failure.ย ย ย 

As LLMs evolve into agentic systems capable of reasoning and autonomous action, their ability to filter signal from noise, weighย relevanceย and anchor decisions in what truly matters willย determineย how capable they are.ย ย ย 

Scale aloneย isnโ€™tย enoughย ย ย 

The context windows for LLMs, how much recent text it can remember and use to shape its next response, have dramatically expanded in recent years. They have grown from a few thousand tokens to a few hundred thousand tokens, and in some cases haveย even reached a million tokens. In theory, this should allow LLMs to read and reason across entire documents, sustain longer conversations, and use information from multiple sources to produce more coherent answers.ย ย ย 

However,ย Stanfordโ€™s 2025 AI Indexย shows that the standard tests for language modelย proficiencyย amongst leading LLMs are producing near identical results despite wide differences in model size and memory. This suggests that increased scale is not enough to make a meaningful difference to LLM efficacy.ย ย 

At the same time, using larger LLMs is more costly. Thisย isnโ€™tย necessarilyย a bad thingย as bigger contexts ensure that LLMs can handle longer documents, recall past exchanges, and reason across complex information. Butย itโ€™sย important for business ROI that the higher spend on compute is matched by better outputs.ย ย ย 

Nvidia estimates that keeping a 128K token conversation (which isย roughly theย length of a short book) in an LLMโ€™s working memoryย can consume about 40 gigabytes of graphics processing unit (GPU)ย  memory. This means that one long chat can max out an entire GPU, which is very costly for potentially only marginal gains in performance.ย ย ย 

More dataย doesnโ€™tย mean better answersย ย 

LLMs need the right data to produce answers that areย accurate, relevant, and useful. Today, they are being fed more information than ever in a bid to make their responses richer and more precise. This can include recent documents, data from internal knowledge bases,ย previousย chat histories, database records, and live information pulled from APIs or other connected applications.ย ย 

Each of these sources adds useful information, but they also bring more complexity. The data is often scattered across different systems, updated at different speeds, and stored in different formats, so stitching it all together takes longer and more computing power. The crux of the issue however is that even with all that data, LLMsย arenโ€™tย guaranteed to use the right information at the right time.ย ย 

Stanford and Berkeleyโ€™sย Lost in the Middleย research shows that when models are flooded with long contexts, they oftenย fail toย recall what matters most. In other words, simply giving LLMs more informationย doesnโ€™tย help if theyย canโ€™tย recogniseย whatโ€™sย relevant.ย ย 

For example, a customer support bot scrolling through an entire chat history instead of focusing on the last issue you raised, is slowed down by theย additionalย information andย is not able toย make a better judgement simply because it has access to more data.ย ย 

The same issue can crop up in enterprise search. Ask an AI assistant for your companyโ€™s latest travel policy, and it might pull up five versions โ€” including one from 2019 โ€” because itย canโ€™tย judge which source is current. The answer looks comprehensive, butย itโ€™sย not actually useful.ย ย 

In short, the problemย isnโ€™tย simply how much data an LLM can access, but how well it manages that data.ย ย ย 

The role of context engineeringย ย 

If more data aloneย isnโ€™tย the answer, better context is. Context engineering is deciding what information an LLM needs, when it needs it, and where that information should come from. The aim hereย isnโ€™tย to feed models everything, but to help them focus on the right things to produce better outputs.ย ย ย 

Getting context engineering right depends on improving performance, relevance, and access. Performance is improved when LLMsย have the ability toย reuse workย theyโ€™veย already done, so time and energyย isnโ€™tย wasted recomputing answers. Relevance, on the other hand, is about helping LLMs narrow their field of view to the data that improves reasoning in relation to a specific task.ย ย Access is about ensuring useful data is always available,ย accurate, and secure when the model needs it. Actioned together, these three elements are what can enable LLMs to make better choices about what to use and when, transforming raw information into meaningful context.ย ย 

Filtering data to deliver accuracyย ย ย 

Modern data infrastructure is what makes this all possible. Real-time in-memory storage speeds retrieval so LLMs can recall useful context in milliseconds, while semantic caching avoids unnecessary compute byย identifyingย previously answered questions.ย ย ย Vector search helps surface the most relevant information from large stores of data. Together, these techniques are what give LLMs the ability to use the right context at the right moment, rather than simply remembering everything.ย ย ย ย 

For example, a business using an LLM to summarise company compliance policies risks inaccurate answers if outdated or unrelated documents are merged. With context engineering, the model filters for the most recent verified documents. Real-time retrieval ensures only up-to-date information is used, making answers faster and moreย accurate. Simply put, that model is not remembering more,ย itโ€™sย reasoning better.ย ย 

AI that makes smarter decisionsย ย 

The transition from static models to dynamic agents marks a fundamental shift in what we should expect from AI. Context windows will continue to expand, but scale alone has never equalled wisdom. What separates truly capable AI systems is their capacity toย identifyย which information matters and the judgement to act on it appropriately. This combination of insight and judgement will shape the next generation of AI.ย ย ย 

Author

Related Articles

Back to top button