AI regulations are moving forward in many countries and states, but not with a sense of consistency that gives organizations a clear path to stay compliant. Some areas are moving quickly, others more deliberately. Meanwhile, models, tooling, and agent frameworks are changing almost overnight, while legislation and enforcement can take years. For enterprises, that means companies often have to make architectural decisions before there is a stable rulebook.

In the US, federal leaders have signaled a preference for moving quickly and reducing regulatory friction, including an executive order aimed at limiting state “obstruction” of national AI policy. At the same time, states are setting de facto requirements, including New York’s RAISE Act, California’s SB 53, and Colorado’s SB 24-205, for high-risk AI used in consequential decisions. Europe is advancing a more prescriptive rights-based approach through the EU AI Act, while GDPR remains the privacy baseline.

The best long-term approach given this situation is to invest in foundations that will enable clear, internal controls, and hold up across jurisdictions: governance, access controls, and the ability to trace an outcome back to the data and steps that produced it. One valuable outcome of this approach is that the data foundation work to prepare for regulation will also improve the quality of AI results and the speed of AI building in the near-term.

My thesis is simple. Companies can only ensure they’re compliant with privacy, trust and governance laws when AI outputs are traceable end-to-end back to all of their inputs, including data values, provenance and the mechanics of decisioning. The benefit is that the system is able to vet and gate inputs on a per-decision basis to ensure they meet with privacy, compliance, security and other constraints. If an organization can’t trace an answer back to the data, tools and access rules used to make a decision, and if it can’t guarantee that only appropriate data was used in a given decision, then adhering to regulatory privacy requirements around access, deletion and explanation becomes very challenging.

Privacy is two problems, not one

In enterprise AI, privacy tends to fail in two ways. One is leakage: the system directly exposes sensitive information to the wrong person, the wrong role, or the wrong channel. The other is its indirect equivalent, when the system uses data that the organization has in ways that are explicitly prohibited or otherwise inappropriate. Examples include “appropriate use” strictures in privacy regulations such as GDPR, and Sections 16 and 21 of Glass-Steagall which prevent banks from using data gathered in corporate banking activities with their investment divisions.

This is where explainability becomes a requirement. In high-stakes workflows, “trust the model” may sound good enough from the outside – when looking at probabilities and human error – but it is rarely sufficient for the people who are accountable when an answer is wrong or when a customer or regulator comes knocking on the door. At a recent CIO event, I was asked who is ultimately responsible if the AI agent gives a customer a bad decision. In the eyes of a customer, the responsibility will always fall to the company providing the service. Therefore the responsibility of cultivating and maintaining trust ultimately falls to the company deploying an AI system.

Most importantly, an AI system needs to know the current state of each user, their data, and if the user has requested that their data be withheld from training. If someone asks for their data to be deleted after the model has been trained, that model may violate GDPR requirements if it can’t confirm permissions when running the query. To be able to maintain compliance with systems like GDPR, companies need:

Input logging and transparency: tracking and logging system and user inputs.
Decision guard rails: controlling which inputs are allowed to be used for decisions.
Explainability and auditability: the ability to go back and describe what data was used to make a decision, and if the decision used a pre-determined process, what the process was.

This last point raises an important question.

LLM-based AI systems use statistical probabilities to predict the next token that’s generated. But when you chain multiple probabilistic steps in an LLM system, the errors stack up. Consider a regulated medical system that combines patient transcripts, imaging and decision-support. Each component may be 95% accurate, but when results are extended through three systems, accuracy can drop 85.7% due to the nature of statistics. That means that one out of every eight patients could be getting an inaccurate diagnosis.

One approach to counter this is using a knowledge graph to help correct and ground the answers at each stage of the process, and improve accuracy throughout the process. Medical researcher Srinivas Reddy Kosna has written about creating a Prompt-Driven Test Generation framework combining LLMs and knowledge graphs to improve accuracy and observability by automating quality assurance for data-intensive systems, creating a 35.8% improvement in fault detection. Several papers at graphrag.com also point to similar conclusions.

GDPR-style privacy requires lineage, starting with identity resolution

When someone invokes a right to access or delete, the first step is basic: who is “that person” in the first place? That is an identity resolution problem. All organizations share the problem of having multiple records across business units, product offerings and touch points, and systems, for what is ultimately the same person. To answer the “who is that person” question, you first need to connect accounts, devices, logins, contact records, phone numbers, and emails back to a unified record for a single person.

In the real world, you also have breadcrumbs that people unwittingly leave behind, which can serve as identifiers and are treated by legislation as such. GDPR’s Recital 30 describes online identifiers such as IP addresses and cookie identifiers as information that can be associated with a real person. Many organizations, therefore, end up with two streams of customer data: authenticated behavior tied to logins, and unauthenticated traffic tied to cookies, sessions, and IP addresses. If a person says “forget me,” the operational risk is that if you cannot reliably connect those identifiers to a single person, those relevant traces will be missing.

This is where knowing your customer – in a “Customer 360” sense – becomes a legal and operational requirement, one of the happy and arguably rare examples where investments in compliance and in data work can also drive better customer experience.

Privacy is not only about tracing an output back to a source document or fact. It is tracing identity, context, and data flows through the systems that influenced a decision, then being able to show what happened in each specific case. This fuels feedback loops that improve result quality, while also avoiding decisioning errors. It also helps defend against claims of errors, including bias.

Five best practices to stay ready for whatever comes next

Make traceability a product requirement. Log prompts, retrieved context, tool calls, policy checks, and the output delivered.
Treat identity resolution as a top priority, not just for understanding your customer but for compliance. Map the identifiers and connections that represent a person across systems and business units, including cookies, sessions and online identifiers, and tie them all back to the disclosure purpose.
Separate data storage and retrieval from model execution. Keep governed data in an AI knowledge layer and enforce data access controls at query time per user and purpose.
Use deterministic reasoning when you need deterministic answers. For rules-based outcomes and “connect the dots” logic, compute the result in a way you can explain and repeat. Probabilistic answers compound errors. Model-based evaluation and deterministic/multi-hop reasoning helps maintain decision quality, and a graph-based data layer makes this approach much faster to implement.
Design agentic workflows for audit readiness. Consider how captured decision traces can be easily incorporated and reviewed by humans and agents. There is a virtuous cycle between these, so use an approach that enables feedback loops. This results not just in clean audits, but also continuous improvement. Your logs can also become your decision traces for the next iteration.

When teams build for governance and traceability up front, it becomes easier to earn buy-in from risk owners, executives, and, when needed, regulators. At the same time, this same work tends to improve the quality of the end application. Working through this ahead of time and including it in one’s plan can help avoid projects getting stuck in pilot, by ensuring the requirements to get to production and satisfy regulators are planned from the start.

Author

AIJ Thought Leader

View all posts

AIJ Thought Leader 17 minutes ago

6 minutes read

Five Best Practices to Handle Current and Future AI Privacy and Governance Regulations

By Philip Rathle, CTO at Neo4j

Privacy is two problems, not one

GDPR-style privacy requires lineage, starting with identity resolution

Five best practices to stay ready for whatever comes next

Author

Privacy is two problems, not one

GDPR-style privacy requires lineage, starting with identity resolution

Five best practices to stay ready for whatever comes next

Author

Related Articles

Rethinking Workforce Design: How AI and Independent Talent Are Redefining HR

The Voice Behind the Voice: Why Provenance is the New Performance

AI governance isn’t a novel problem, so why are we treating it like one?

Your AI Assistant Can See Your Amazon Store. It Should See the Network Around It.