Future of AI

Why DeepSeek’s model distillation is more disruptive than you think

By Victor Botev, CTO and Co-founder at Iris.ai

It’s been about 2 months since DeepSeek-R1 sent the global AI industry scrambling. Since then, OpenAI, Microsoft, and Meta have all turned their attention to the technique that’s considered to be the key to DeepSeek’s success: model distillation.

This isn’t strictly accurate – or at least, it isn’t the full story. Less attention has been given to the process of reinforcement learning, a technique that essentially enables models to use trial-and-error to achieve the most optimal results, and which DeepSeek-R1 used instead of the more standard technique of supervised fine-tuning (SFT). When we talk about DeepSeek-R1 managing to outperform ChatGPT by some metrics on approximately one-tenth of the cost, these cost savings are largely due to the use of reinforcement learning, which has proved far more efficient and cost-effective than SFT.

With DeepSeek, the importance of model distillation has more to do with the quality of outputs. The concept is fairly straightforward: a small ‘student’ model is trained on data produced by a large language model (LLM), the ‘teacher’ model, so that when used it will produce the same results. This was used to transfer some of the reasoning knowledge from general models like DeepSeek V3 to R1.

Model distillation is not new either. Engineers began seriously experimenting with distillation techniques in the mid-2010s (a whole decade before most people had heard of DeepSeek) when a paper co-authored by Geoffrey Hinton formalized the technique; in concept, it can be traced as far back as neural network research in the early 90s.

So, if it’s not new, and it sounds relatively straightforward in theory, what’s all the fuss about?

Demystifying Distillation

The answer lies in how DeepSeek has leveraged open source. Models like OpenAI’s ChatGPT are closed models driven by opaque processes. What makes DeepSeek really revolutionary is just how open it is about its techniques. The paper published alongside the model’s release disseminates the precise method to successfully distil an AI model wider than anything done previously. Demonstrating this clearly is the fact that the company has released around 60 different distilled models, but there are already about 6,000 in Hugging Face; give people the initial example model, and the open source community will produce 100 times more.

It’s hard to overstate how significant this is for open source AI developers. Here’s a technique that’s been theoretically available for years but hasn’t seen widespread adoption and has been difficult to implement effectively. Suddenly, everyone has access to thousands of functioning examples and the blueprints to expand and refine the techniques themselves.

The Business Implications

This, in turn, has the potential to translate into a sea change in how businesses and institutions across the world use AI. Already, some are turning to small language models (SLMs) for similar reasons. Using less computing power is attractive to firms seeking to save costs on energy and GPUs, and small models trained on proprietary, business-specific data can yield more useful results in certain situations than LLMs trained on more general data. Distillation can generate smaller models with similar capacities as LLMs specialized with limited in-scope reasoning questions. That opens the doors for cheaper, very capable small language models.

These can be extremely helpful tools depending on the specific use case. The point is that with ever-improving distillation, businesses will have access to the power of LLMs and SLMs without the associated energy costs and with far less investment required in costly high-performance hardware.

The transformative implications are obvious, then. How the global AI industry should react to it, however, is less clear.

Can Europe Leapfrog the Rest?

For a long time, Europe has been seen as lagging behind the rest of the world when it comes to AI. It’s true that in some cases, the continent’s tech industry has been reactive rather than proactive, investing in the same tech as the U.S. and China – but two to three years after the global leaders.

Even now, we can see this strategy in action on a government level. Governments are tending towards investment in AI gigafactories – huge data centers housing thousands of GPUs – to train large AI models. Even OpenEuroLLM, a worthy project to deliver open source models in every EU language, is fundamentally a response to global trends, not a leader.

Deepseek has highlighted the areas in which Europe has the opportunity to differentiate itself from the dominant models set by the AI giants – and potentially take the lead. An ecosystem of businesses producing small, predictable, and secure models would cultivate a level of trust in European AI products that’s still lacking in many of the more well-known models from other parts of the world. This will appeal not only to enterprises and their customers, but also to lawmakers, who will have an easier time regulating AI models that are by nature more transparent and accountable.

On top of this, open source collaboration is something the continent has always excelled at, with some countries successfully placing firm open access requirements on any government-funded research. Mistral AI was instrumental in providing resources for DeepSeek-R1, so it’s clear that European companies have the technical skills and drive to embrace this open source boom. OpenEuroLLM is a step towards embracing this open source future, but more can be done.

Finally, while funding data centers is important, a race to catch up with the US is probably futile. For one thing, the energy costs in Europe compared to the US are substantially higher on the whole, making any data center project an immediately more expensive proposition on this side of the Atlantic. Matching $500 billion promised investments like Stargate isn’t something EU countries are likely to pursue.

Thankfully, DeepSeek’s success with model distillation and its open source approach proves that it doesn’t need to. Distilled models and small language models require significantly less computing power than large models, and with some of the most well-established open source networks in the world, Europe is in a prime position to take full advantage of the new path mapped out by DeepSeek.

Distilling the Future

As always, there are no certainties when it comes to AI. DeepSeek is not perfect, and model distillation is not without its challenges. If not executed carefully, training small models off of data generated by LLMs opens up the risk of creating negative feedback loops, reducing accuracy, and consolidating flaws in the initial ‘teacher’ model.

However, perhaps its most important impact is that it has shown the value of open source collaboration and hinted at a world where ever-improving AI models don’t necessarily mean rising costs, computing power, and energy requirements. Businesses across the world seeking to reap the benefits of the AI revolution will be eagerly watching what the industry does next, and providers in Europe have the opportunity to play a leading role in this future.

Author

Related Articles

Back to top button