Future of AIRegulation

What DeepSeek vs Open AI teaches us about AI governance

By Oisin Boydell, Chief Data Officer at Corlytics

The OpenAI-DeepSeek controversy is not just about one company throwing accusations at another — it’s a wake-up call for the entire AI industry. Before we end up in a race to the bottom to develop bigger, better, faster AI models, the AI community must seriously examine the ethical challenges of what it takes to develop them in the first place.

This means looking at practices such as “distillation” and how it can impact intellectual property rights, security considerations, and compliance with international regulatory frameworks. By placing good, structured governance at the heart of each of these factors, through practices such as robust data provenance and accuracy checks, the industry will place itself in a position of strength. This way, innovation is facilitated rather than hindered by necessary guardrails, all the while avoiding embedding vulnerabilities into AI models.

 Is distillation a double-edged sword?

That concept of “distillation” is the heart of this conflict. Simply put, it’s a technique where a smaller AI model is trained to replicate the performance of a larger, pre-existing model. It accelerates AI development because it allows developers to compress large models into smaller, more efficient ones. As a result, AI models can become more accessible to businesses with limited resources.

But it comes at a cost. If the replication is considered unauthorised, this practice raises serious legal and ethical questions because it side-steps the time and investment made by the developer of the original model. OpenAI’s accusations against DeepSeek suggest this is one such instance and constitutes a violation of intellectual property rights. And it wouldn’t be the first such accusation.

Concerns over model replication, and model integrity more broadly, have only increased with time. We don’t need to look further than Meta’s Llama models facing leaks and unauthorised adaptations in the past, while many of the major model developers are currently facing lawsuits for using copyrighted material for model training without permission or compensation.

For instance, OpenAI has been sued by multiple parties, including The New York Times, other news organisations, and music industry representatives like Saregama and T-Series in India. The lawsuits allege that OpenAI used copyrighted content such as articles, song lyrics, and music compositions to train its models without permission or compensation. Google has been fined €250 million by French authorities for using content from major news publishers like Agence France-Presse (AFP) to train its Gemini (formerly Bard) AI without notifying or compensating them. Stability AI has faced lawsuits related to its image-generation models. Plaintiffs allege that the company used copyrighted visual artworks as training data without permission. The OpenAI-DeepSeek case is another chapter in the same story. Yet, it raises a foundational question we can no longer ignore in the AI community: is distillation an innovative way to advance AI, or are the risks getting simply too high?

Navigating murky legal and ethical waters

Historically, AI firms have trained models using vast amounts of human-generated data scraped from the internet, usually without permission from content creators. We see companies defending the practice under “fair use” arguments, while content owners push for stricter copyright protections. Now, as the tide turns to AI firms extracting knowledge from each other’s models, the line between ownership and originality is blurred even further.

These are questions for policymakers to wrangle, not just risk and compliance departments. Determining if AI models have copyright protection like traditional works, or whether distillation amounts to a breach of intellectual property and competition laws, will define the shape and course of future AI development. Until resolved, this regulatory ambiguity can pose a risk for any businesses using AI, but especially for those in highly regulated sectors such as financial services. These can include the loss of clients and partners, reduced trust, and even financial penalties if an oversight is significant enough to draw regulatory attention.

The European Union’s AI Act has made a start in providing some of that much needed clarity through its transparency mandates. In time, these will likely become more comprehensive. Meanwhile, in the U.S., we see AI copyright disputes intensifying, with the definitions of legal boundaries for AI model training still evolving. But by placing governance at the core of any AI deployment, businesses can take a demonstrable, proactive step in ensuring they’re on the right side of regulation from the very start.

Governance is key to safer AI development

Distillation not only presents legal and ethical concerns but also poses security risks. If an AI model can be replicated through distillation, malicious actors could use similar techniques to extract proprietary knowledge from leading AI firms.

This is where unauthorised model replication could lead to a range of risks. For example, data leakage, where sensitive information used during training is unintentionally embedded in distilled models. Alternatively, AI could be used to bypass safety mechanisms, where guardrails built into proprietary models, such as content moderation, may not transfer effectively to replicated versions. Or, even though it might be a rarity today, malicious actors may go further than merely extracting data, but also injecting misleading information into the model in what is known as “data poisoning”.

Beyond the threat of external actors, there’s no understating the need for AI system to maintain high levels of accuracy. If an AI model is trained on AI generated outputs, then the resulting model will inherit all the inaccuracies from the original, and cascade them further. Few organisations can afford to rely on false information, especially in tightly regulated industries.

These risks clearly highlight the importance of embedding governance in the heart of ethical AI development. With proper governance, there’s a framework to build and use AI in a way that protects integrity and accuracy, while also providing safeguards against possible AI model misuse.

Comparing global regulations

This debate is not just a Silicon Valley issue, affecting a bubble of tech giants. AI model training and replication are global concerns, particularly as China, the U.S., and the EU adopt increasingly divergent AI regulations.

China, for instance, has already imposed strict oversight on AI model transparency and licensing, while the EU AI Act takes a more consumer protection-focused approach. The U.S., on the other hand, has been comparatively slower to introduce comprehensive AI legislation.

These regulatory differences could lead to fragmented standards of AI governance. For multinational businesses, this also creates no small challenge for compliance, as AI regulations may vary drastically between jurisdictions.

The AI of tomorrow

To sustain ethical AI development, the industry must find a balance between innovation, regulation, and ethical responsibility. It’s easier said than done, but most worthy things are.

To advance, policymakers and AI researchers must collaborate to establish clear intellectual property protections for AI models while allowing room for innovation. This is how we can clear the murky waters industry faces today. Secondly, transparency and accountability must be built into the DNA of AI firms, who must do more to disclose information about their training methodologies, data sources and distillation practices

As AI advancements continue to race ahead, ethical and regulatory frameworks must evolve at pace, in parallel. The challenge ahead is ensuring that AI remains open, secure, and aligned with human values – and governance is where we start.

Author

Related Articles

Back to top button