DataAI & Technology

AI Is Turning Data Sovereignty Into a Security Problem

By Andreas Malik

A company can do nearly everything it thinks data sovereignty requires and still get this wrong. 

It can host its models in the right region, use an approved cloud provider, run the right contractual checks, and keep its legal team satisfied. Yet the moment an agent or copilot is connected to live internal systems, the real question changes. The issue is no longer only where the model sits, but what the system can reach once it begins to operate. 

That shift is becoming hard to ignore. A recent Intel survey found that 62% of organisations cite data sovereignty and privacy risks as the biggest factor slowing AI projects in the public cloud. The same research found that 16% have no access to facilities with guaranteed data sovereignty at all, while 80% expect to rely on confidential computing to address sovereignty across cloud and edge environments. 

Most coverage still treats this as a legal or infrastructural issue. In practice, it is increasingly an operational one. 

Many organisations cannot clearly prove what data their AI systems can access, where that data flows during ordinary use, or what is quietly retained in the background once an interaction is over. That is where AI safety and data sovereignty begin to converge, and that is also where they often fail. 

The old map no longer matches the territory 

For years, data governance rested on assumptions that were not unreasonable for the systems most firms actually had. 

Data sat in recognisable repositories. It moved through relatively fixed workflows. Access was largely mediated by named users, known applications, and reasonably bounded permissions. Governance therefore focused on location, ownership, retention, and access rights. A firm might not know everything, but it usually knew enough to draw a map. 

That map is now becoming less reliable. 

Modern AI systems, especially copilots, assistants, and agents, do not simply wait for a human to open the right file in the right place. They search, retrieve, combine, summarise, and transmit information across multiple systems in real time. They work through connectors, APIs, internal search layers, vector indexes, knowledge bases, logs, orchestration tools, and sometimes external services as well. 

In a more static software environment, data mostly stayed where it was put. In an AI environment, data becomes something the system can discover, assemble, and reuse on demand. That changes not only the meaning of control but especially the meaning of sovereignty. 

Sovereignty is no longer only about residency 

Traditional data sovereignty asks sensible questions. Where is the data stored? Which jurisdiction governs it? Does it remain within an approved environment? Does it cross a border? 

Those questions still matter, but they no longer exhaust the problem. 

Once AI systems begin retrieving information dynamically, sovereignty, then, is not just about where something is located (residency), but also about how it is handled during operation (runtime). A model may be hosted in the right region and still produce the wrong exposure if it is allowed to retrieve the wrong internal material, pass that material through the wrong workflow, or leave traces of it in logs, indexes, caches, or connected systems. This is the point many organisations have not yet fully absorbed. They are still trying to solve a moving problem with a static framework. What matters now is not only where data rests, but what the system can retrieve, store, infer from, and transmit during normal operation. That is a different category of risk altogether. 

The threat model has already shifted 

This is not a theoretical shift, the threat model has already actively changed. CrowdStrike’s 2026 Global Threat Report found that AI-enabled attacks rose 89% year over year, and that malicious actors exploited legitimate GenAI tools at more than 90 organisations by injecting malicious prompts, while also abusing AI development platforms. The same report noted that intrusions increasingly move through trusted identities, SaaS applications, and cloud infrastructure, blending into ordinary activity. 

That matters because it shows where attackers are learning to operate. They are not only trying to break systems from the outside. They are learning to move through approved tools, inherited trust, and legitimate AI workflows. A concrete example from earlier this year illustrates this: Varonis Threat Labs disclosed ‘Reprompt,’ a one-click attack chain against Microsoft Copilot in which a malicious link causes Copilot to run prompts the user did not intend, bypass safety controls, and silently exfiltrate sensitive personal data. The point of that case is not merely that one vulnerability existed and was patched. It is that a normal-looking AI interaction path became a route for hidden instruction, retrieval, and exfiltration. 

The danger, in other words, is no longer confined to hostile code entering a system. It increasingly lies in hostile instructions moving through systems the organisation has already chosen to trust. 

The real exposure sits lower down 

This is why so many firms are governing the wrong layer. Most AI governance programmes begin at the top. They produce policies, prompt rules, acceptable-use standards, review boards, and approval processes. Some of that is necessary. None of it is sufficient. 

The real exposure often sits lower down, in places that look uninteresting until an AI system starts using them at speed. It sits in unclassified repositories that no one has reviewed in years. In inherited permissions left behind by reorganisation. In search indexes built for convenience rather than containment. And in logs, traces, cached outputs, backups, and connector paths that were never designed with agentic workflows in mind. 

From not only my experience, but what we are seeing across sectors is becoming clear: AI safety and data sovereignty fail at the same place, namely the data layer. 

That phrase should not be read as an abstraction; it is practical. The data layer is the operational substrate that determines what the system can see, what it can retrieve, what it can retain, and what it can pass onwards. If that layer is poorly classified, loosely permissioned, or only partially understood, then policy language at the top cannot save the deployment from the reality underneath it. 

The scale of that visibility problem is already visible in the wider market. Thales’ 2026 Data Threat Report found that only 34% of organisations know where all their data is stored, and just 39% say they can fully classify their data. 

That is a striking figure in an era when AI systems are being connected to enterprise data environments precisely in order to make more of that data searchable and usable. 

Why organisations misgovern this problem 

The problem is not simply that governance is weak. More often, it is that governance is aimed at the wrong target. 

Older governance models assumed a world in which access was slower, narrower, and easier to inspect. They were built for systems where data tended to live in known locations and where user actions were easier to understand in sequence. AI changes that by introducing continuous retrieval, dynamic context assembly, and machine-led discovery across environments that were never fully mapped in the first place. 

That is why so much of the current discussion feels oddly weightless. It focuses on whether a policy exists, whether a system is approved, or whether a model meets some broad requirement. Those are not trivial questions, but they do not tell an organisation what the system can actually reach today. AI risk must be governed across the full operational lifecycle, including data handling, access, telemetry, monitoring, and human oversight, not merely through policy language at the top.  

This is one reason the issue is beginning to slow real deployment. Different administrators are increasingly being asked questions they cannot answer with confidence: which repositories are in scope, which connectors are active, which traces are retained, and which data remains inside approved boundaries once the workflow is running. If those answers are unclear, deployment slows for good reason. 

Policies can govern intention. They do not, by themselves, govern reach. 

What sovereignty-by-design should actually mean 

If firms want to make AI deployments safer and more defensible, they need to treat sovereignty as an operational discipline rather than a regulatory checkbox. 

That begins with continuous discovery. An organisation has to know what data exists, where it sits, how it moves, and which systems can touch it. This means maintaining a live view of what data exists, which systems can reach it, and which connectors expand that reach. In an AI environment, a one-off inventory is already drifting out of date by the time it is completed. 

It then requires operational classification. Data must be classified not only by broad compliance category, but by practical sensitivity, business criticality, jurisdictional constraint, and appropriateness for AI retrieval. Not everything that technically exists in an environment should be equally reachable simply because a connector can find it. In practice, it sits in the surrounding control stack: data catalogues, DSPM tooling, access policies, retrieval filters, orchestration rules, and the review processes that decide what should remain reachable at all. 

Finally, it requires enforceable guardrails. Organisations need runtime controls that determine what agents and copilots are permitted to retrieve, summarise, transmit, and retain. That includes not only the visible user-facing application, but also the connectors, indexes, logs, traces, cached context, and orchestration layers around it. 

Before an agent goes live, the organisation must know not only what the model can do, but what the system is permitted to touch. 

The argument beneath the argument 

This is, at bottom, a piece about misplaced confidence. 

Many firms still speak about AI safety as though it were mainly a model problem, and about data sovereignty as though it were mainly a geography problem. Both views are now too narrow. AI systems operate through retrieval, assembly, and action across environments that are often far less orderly than governance documents assume. 

That is why the issue is beginning to slow real deployment. The obstacle is not only regulation or infrastructure cost. It is a more basic inability to prove control over what the system can access and what it does with that access during ordinary operation. 

The next stage of AI adoption will depend less on where a model is hosted than on whether an organisation can prove control at the data layer once the system begins to act. 

Bio: Andreas Malik is the founder of Risk & Decision and Resilient24. He is a digital resilience specialist with over 20 years of experience working with various companies, public institutions, and financial organisations on risk, continuity, IT security and recovery. His work focuses on helping organisations gain operational control over their data and systems before incidents occur. 

Author

Related Articles

Back to top button