Future of AI

The ROI of AI: How to lower inference costs without sacrificing performance

By Phil Burr, Director at Lumai

The arrival of DeepSeek, the new large language model developed by a team in China, has sent shockwaves through the technology industry regarding how cheaply it was produced compared to competitors such as ChatGPT. Although the difference in performance between the two models for tasks like writing and coding is minimal, the key discussion point is the disparity in cost and the significantly lower compute power that models like DeepSeek require.

Some dispute these costs, according to a BBC article, as it employs “already existing technology, along with open source code.” However, it establishes a precedent for how technological breakthroughs can transform the training and powering of AI models.

One thing that cannot currently be disputed, however, is the impact all of this AI development is having on datacentres’ cost of construction and their vast energy consumption.

The surging power strain on datacentres

Despite new research suggesting the energy used for a ChatGPT query is far less than previously reported, “the scale at which AI is being deployed is expected to drive enormous, power-hungry infrastructure expansion. In the next two years, AI data centers may need nearly all of California’s 2022 power capacity (68 GW), according to a Rand report.”

Every time an LLM receives an input and generates an output, it uses compute resources – and of course these resources cost. There is the cost of computation equipment, the cost of the datacentre infrastructure (which is proportional to the power it needs to supply and cool) and the cost of the energy consumed. By amortising capital costs and adding operational costs, the cost of inference can be calculated.

As AI models rapidly develop and grow in popularity, they create a significant power strain on datacentres, increasing the subsequent inference costs and energy consumption. Such costs can make or break a business’ case for AI deployment. Given Goldman Sachs has predicted that the spend on AI datacentres and hardware will reach the $1 trillion mark, reducing such margins can make a big difference. And there is of course the necessity to improve the sustainability of datacentres as well.

To keep up with the uptick in AI development, companies and datacentres need to find ways of lowering inference costs without sacrificing performance. But how can they achieve this?

A groundbreaking approach

Datacentres currently rely on silicon chip-based AI accelerators for AI processing. These chips are power hungry and can’t meet the necessary capacity required for AI’s surging power demand (without hitting extreme power limits). Every LLM output requires more energy, more cooling, more infrastructure and therefore more emissions – all contributing to the rising cost of inference.

Chip companies continue to increase the performance of AI accelerators, but their tactic of adding more silicon, power and cost is a process chasing diminishing returns. To reduce these costs and ensure processing can occur within the power capabilities of datacentres, new approaches are needed to perform AI computation and reduce processing power.

As the BBC article explains, part of the reason DeepSeek could be built cheaper was by pairing a collection of more expensive and already imported Nvidia A100 chips “with cheaper, lower-end ones”. Alongside using open source practices, it was an effective move. But these Nvidia chips, which are used by many of the prominent AI models, still cost $10,000 each – and its latest chips cost over $30,000. Imperatively, this approach also doesn’t solve the issue of rising power consumption.

One way to avoid the need to use expensive new silicon chips is by instead using optical compute, which can leverage the relatively low cost types of optical components already used in datacentres. An optical AI accelerator not only reduces infrastructure costs but offers low power and energy efficient computation.

This is because optical AI acceleration uses photons instead of electrons to compute and performs highly parallel computing, enabling it to provide the necessary leap in AI performance and, crucially, use only 10 per cent of the power of a GPU currently found in datacentres. Consequently, less power, cooling and expensive infrastructure are needed.

Driving the ROI of AI

DeepSeek shows the innovation that can happen when circumstances force new approaches. It is a real reminder to AI companies that alternative solutions can be formed without simply opting to use more power and costly new technology.

Two things in particular should be triggering a change in approach. The first is the fact that surging AI demand cannot be efficiently met by current technology. The second is that the inference costs needed to meet this demand are unsustainable, as is the energy demand and the impact of this on the environment.

Alongside other advances in datacentre infrastructure and cooling, optical AI processing offers a groundbreaking method for delivering performance at a fraction of the cost and power of current GPUs. So, rather than being a drain on resources, optical AI processors can boost the ROI of AI infrastructure and create a more sustainable future.

It’s a new approach. But developments like DeepSeek show they are more than worth considering.

Author

Related Articles

Back to top button