The AI industry has long treated more compute as the most reliable route to better models. Faced with almost any constraint, it reaches for the same answer: more parameters, more training data, more GPUs, more energy, more data centre capacity.

That formula delivered extraordinary results. It also helped create the economic logic now driving the market. Big Tech looks set to spend roughly $650 billion to $700 billion on AI infrastructure in 2026 alone, on the expectation that ever-greater computing power will continue to produce more capable systems.

That expectation is starting to look expensive. GPT-4 reportedly cost more than $100 million to train, while external estimates put Gemini Ultra’s training cost higher still. Frontier models continue to improve, though each advance now relies on far greater capital and large‑scale infrastructure.

By late 2024, reports described a more difficult reality inside the major labs. OpenAI, Google and Anthropic were all seeing smaller gains from pushing training runs to ever‑larger scales. Progress is still coming through, yet the economics around that progress look very different. The industry is now working out how much additional capability scale can still deliver, and what it takes to reach those gains.

Why bigger looked better

The industry had good reason to believe in brute-force scaling. GPT-4 was a much bigger jump from GPT-3.5 than a routine model update. It performed far better on benchmarks and professional exams, and later releases also continued to improve. GPT-4.5 delivered stronger factual accuracy, lower hallucination rates and better conversational quality on OpenAI’s published evaluations.

Even so, the gains look narrower and more expensive to extract. GPT-4 felt like a broad jump in capability but many of the gains since then have come through refinement, tuning and better performance in specific domains. Useful advances, certainly. A different profile of progress, definitely. That points to the next stage of AI development: building models that do more with less.

Removing waste, not ambition

Efficiency is what makes advanced models workable outside controlled test settings. Different techniques play a part here – compression, quantisation, pruning, knowledge distillation – each trimming a different kind of excess. Some reduce precision, some remove duplicated behaviour, some cut parameters that add little to the final output. Together, they leave a leaner, lower-overhead model that runs more quickly and is easier to deploy.

At a certain point, adding more size stops feeling like progress and starts looking like a costly stand-in for optimisation. The smarter path draws on a principle visible across nature and cognition alike: focus resources where they matter most. A flock turns quickly because each bird responds to the signals closest to it. Human thinking works the same way – we narrow the field and move towards what seems most relevant, rather than searching every memory each time we need an answer.

Some leading firms are applying that same discipline to AI infrastructure – augmenting models to deliver maximum capability while being lighter to run, less computationally demanding and closer to the point of use. Where a model runs, and under what conditions, is fast becoming as important as what it can do.

Beyond the cloud bill

When organisations deploy AI, they have to think about practicalities: where the model runs, how data is processed and which jurisdiction governs the system.

Privacy rules, geopolitical pressure and AI adoption are all pushing more organisations towards sovereign cloud strategies. At the same time, sovereign AI is increasingly being framed as a question of strategic resilience and control over critical digital infrastructure. In the UK, that debate now extends to domestic compute capacity and dependence on foreign technology providers.

Leaner models widen the range of workable deployment options. They fit more easily into private cloud, on-premise and edge environments. They make it easier to keep inference close to where it is needed and reduce the infrastructure burden attached to privacy, latency, resilience and control. AI then starts to fit the operational environment around it, rather than forcing the environment to bend around the model.

For governments and regulated organisations, the practicalities tend to come first. They have to think about where a system will run, what data it will touch and which rules it must satisfy. A model can be extremely capable and still be unusable if it does not sit comfortably within those boundaries. Leaner systems help make model performance fit the environment it has to work in.

When size stops being enough

For a while, bigger models could justify almost everything. Better performance made the cost look worth carrying. That balance is becoming harder to sustain.

A model can be powerful and still come with too much drag: too much infrastructure, too much latency, too little flexibility over where it runs. Leaner systems shift that balance. They are easier to place, easier to govern and easier to work into real operating conditions.

That changes the basis of competition. The strongest position in AI will come less from raw size alone and more from the ability to deliver advanced model performance in forms organisations can actually use well. Cost is part of that. So is speed, control, portability and operational fit.

Scale brought the industry this far and will remain part of the picture but the next step change in value will come from making advanced models lighter to run, easier to place and more workable under real conditions.

Author

AIJ Thought Leader

View all posts

AIJ Thought Leader 2 hours ago

4 minutes read

Why brute-force AI scaling is a dead end – and what comes next

By Dr Viroshan Naicker, Co-Founder, Refiant AI

Why bigger looked better

Removing waste, not ambition

Beyond the cloud bill

When size stops being enough

Author

Why bigger looked better

Removing waste, not ambition

Beyond the cloud bill

When size stops being enough

Author

Related Articles

The Great Leveling: How AI Commerce Closes the Think–Do Gap

Gartner: The Sovereign AI Shockwave: Why Organisations Must Rethink their AI Strategy in 2026

How AI Is Transforming Stock Investment Strategies and Portfolio Personalization in 2026

10 Best AI Development Companies in Dubai, UAE in 2026