AI Business Strategy

Why brute-force AI scaling is a dead end – and what comes next

By Dr Viroshan Naicker, Co-Founder, Refiant AI

The AI industry has long treated more compute as the most reliable route to better models. Faced with almost any constraint, it reaches for the same answer: more parameters, more training data, more GPUs, more energy, more data centre capacity.  

That formula delivered extraordinary results. It also helped create the economic logic now driving the market. Big Tech looks set to spend roughly $650 billion to $700 billion on AI infrastructure in 2026 alone, on the expectation that ever-greater computing power will continue to produce more capable systems. 

That expectation is starting to look expensive. GPT-4 reportedly cost more than $100 million to train, while external estimates put Gemini Ultra’s training cost higher still. Frontier models continue to improve, though each advance now relies on far greater capital and largescale infrastructure. 

By late 2024, reports described a more difficult reality inside the major labs. OpenAI, Google and Anthropic were all seeing smaller gains from pushing training runs to everlarger scales. Progress is still coming through, yet the economics around that progress look very different. The industry is now working out how much additional capability scale can still deliver, and what it takes to reach those gains. 

Why bigger looked better 

The industry had good reason to believe in brute-force scaling. GPT-4 was a much bigger jump from GPT-3.5 than a routine model update. It performed far better on benchmarks and professional exams, and later releases also continued to improve. GPT-4.5 delivered stronger factual accuracy, lower hallucination rates and better conversational quality on OpenAI’s published evaluations. 

Even so, the gains look narrower and more expensive to extract. GPT-4 felt like a broad jump in capability but many of the gains since then have come through refinement, tuning and better performance in specific domains. Useful advances, certainly. A different profile of progress, definitely. That points to the next stage of AI development: building models that do more with less. 

Removing waste, not ambition 

Efficiency is what makes advanced models workable outside controlled test settings. Different techniques play a part here – compression, quantisation, pruning, knowledge distillation – each trimming a different kind of excess. Some reduce precision, some remove duplicated behaviour, some cut parameters that add little to the final output. Together, they leave a leaner, lower-overhead model that runs more quickly and is easier to deploy. 

At a certain point, adding more size stops feeling like progress and starts looking like a costly stand-in for optimisation. The smarter path draws on a principle visible across nature and cognition alike: focus resources where they matter most. A flock turns quickly because each bird responds to the signals closest to it. Human thinking works the same way – we narrow the field and move towards what seems most relevant, rather than searching every memory each time we need an answer. 

Some leading firms are applying that same discipline to AI infrastructure – augmenting models to deliver maximum capability while being lighter to run, less computationally demanding and closer to the point of use. Where a model runs, and under what conditions, is fast becoming as important as what it can do. 

Beyond the cloud bill 

When organisations deploy AI, they have to think about practicalities: where the model runs, how data is processed and which jurisdiction governs the system. 

Privacy rules, geopolitical pressure and AI adoption are all pushing more organisations towards sovereign cloud strategies. At the same time, sovereign AI is increasingly being framed as a question of strategic resilience and control over critical digital infrastructure. In the UK, that debate now extends to domestic compute capacity and dependence on foreign technology providers. 

Leaner models widen the range of workable deployment options. They fit more easily into private cloud, on-premise and edge environments. They make it easier to keep inference close to where it is needed and reduce the infrastructure burden attached to privacy, latency, resilience and control. AI then starts to fit the operational environment around it, rather than forcing the environment to bend around the model. 

For governments and regulated organisations, the practicalities tend to come first. They have to think about where a system will run, what data it will touch and which rules it must satisfy. A model can be extremely capable and still be unusable if it does not sit comfortably within those boundaries. Leaner systems help make model performance fit the environment it has to work in. 

When size stops being enough 

For a while, bigger models could justify almost everything. Better performance made the cost look worth carrying. That balance is becoming harder to sustain. 

A model can be powerful and still come with too much drag: too much infrastructure, too much latency, too little flexibility over where it runs. Leaner systems shift that balance. They are easier to place, easier to govern and easier to work into real operating conditions. 

That changes the basis of competition. The strongest position in AI will come less from raw size alone and more from the ability to deliver advanced model performance in forms organisations can actually use well. Cost is part of that. So is speed, control, portability and operational fit. 

Scale brought the industry this far and will remain part of the picture but the next step change in value will come from making advanced models lighter to run, easier to place and more workable under real conditions. 

 

Author

Related Articles

Back to top button