AI

The Data-Ready Enterprise: Building Infrastructure for Scalable AI

By Joe Regensburger, VP of Research, Immuta

The growth in Agentic AI systems have had profound implications for data driven enterprises.  Particularly there has been a significant change in how data is consumed.  Agentic AI systems, particularly conversational agents, have accelerated long promised growth of self service analytics.  Conversational Agents reduce the barriers to entry, specifically by being able to translate natural language into complex code and visualizations.  This results in a far greater number of consumers being able to utilize data to answer complex questions and realize business value.  However the pipelines must be strengthened in several ways when moving from traditional batch processing, where a few key users used data to address fixed business concerns, into more ad hoc analytics, where a vast and expanding user community is asking an ever increasing number of business questions. 

There are four key areas which can be strengthened: enhanced documentation and metadatadata lineagespeed of access, and governance of broader use cases.  These pillars are critical for transforming an enterprise’s data infrastructure from one designed for fixed, batch-oriented reporting into a fluid, trustworthy, and scalable system ready for the ad hoc and expansive demands of modern AI-driven consumers.  Addressing these areas systematically will ensure that the underlying data foundation can reliably support a larger, more varied community of users asking an ever-increasing number of complex business questions. 

Translating natural language questions into analytics and visualizations code is a central use case of conversational systems. But the success of these systems relies heavily on having thorough documentation on how the data is organized and what assumptions are made when structuring the data.  Experts in these systems carried significant institutional knowledge and experience that helps them successfully navigate complex data systems and avoid critical misjudgments.  Exposing data systems to a large audience increases the likelihood that mistakes will be made.  When these sophisticated data systems are exposed to a large, diverse audience via an AI intermediary, the probability of critical misjudgments and misinterpretations escalates significantly. An AI agent, lacking the expert’s contextual experience, can easily misapply a column, incorrectly join datasets, or fail to account for critical business rules if those details are not explicitly documented.   

Critically enhanced documentation and metadata should capture: 

  1. Data Organization and Schema: A deep understanding of how tables are structured, what relationships exist between various data entities, and the nuances of column definitions. 
  2. Data Provenance and Quality: Knowledge of where the data originates, the processes it has undergone, and any known biases, limitations, or quality issues. 
  3. Domain-Specific Business Logic: Unwritten assumptions, calculations, and interpretations that business analysts apply to the raw data to derive meaningful metrics. 

Data Lineage is tightly related to improved documentation and metadata.  Data Lineage captures the complete lifecycle of data, including its origin, transformations, and movement through various systems, making it tightly related to metadata as it provides the critical provenance and quality information.  This is central in building trust and informing how data can be used.  Complete lineage provides AI Agents with important information on how data is transformed, summarized, and consumed.  This information can help agents better plan and understand how to address ad hoc requests. The tight relationship between Data Lineage and metadata is critical because lineage essentially generates a vital category of metadata: provenance and quality information. By providing an unbroken chain of custody and a historical record of changes, lineage answers essential questions like: Where did this data come from? How was it calculated? Has it been aggregated or de-duplicated? This transparency is absolutely central to building trust in the data assets across the organization. When users, analysts, and decision-makers can verify the lineage, they can confidently inform how data can be used and ensure that it meets regulatory, compliance, and internal standards.  In essence, Data Lineage transitions data from being an opaque asset to a fully documented, traceable, and trusted resource, maximizing its utility for both human analysis and advanced AI applications. 

The growth in agents promises (or threatens depending on your perspective) to greatly increase the flux in data access requests.  Agents can be created quickly and directed at data platforms.  This means that the number of actors (both human and agentic) will expand quickly.  This growth will stress existing approval workflows.  Organizations need systems in place to handle this growth if AI systems are to realize the potential, promised business benefits.  Traditional approval lines will be ineffective at meeting these new demands.  Time tested rule based access control paradigms, including Role Based Access Control, Attribute Based Access Control, and Purposed Based Access control, are central to answering these demands.  But this is only a portion of the answer.  Rule based paradigms can be designed to handle normal access requests or establish guardrails for access, but there also needs to be tools in place to handle short lived or atypical access requests.   

Risk based access control can be highly effective at addressing atypical requests.  Risk-based access control can be highly effective at addressing atypical requests that fall outside the parameters of established security policies. Risk-based access control introduces a dynamic evaluation layer to securely manage these novel requests, ensuring data utility and speed for AI applications without compromising governance.  This means capturing and mining past approval decisions, understanding data use agreements, and defining guardrails for access. 

Governance undergirds each of these improvements and is fundamentally essential to managing the deployment of sophisticated Agentic systems. The rise of these autonomous and semi-autonomous systems introduces new complexities and potential vulnerabilities into existing organizational structures. Specifically, the dynamic, often opaque, nature of Agentic decision-making places significant stressors on traditional governance processes. 

The key stressors are two-fold: monitoring and providing timely insights. 

  1. Continuous Monitoring: Agentic systems operate with high velocity and can evolve their behavior based on real-perceived-time data and interactions. This necessitates a shift from periodic compliance checks to continuous, real-time monitoring of their inputs, internal states, decision-making logic, and outputs. Governance must ensure that this monitoring is comprehensive, covering aspects such as bias detection, drift from intended behavior, and compliance with privacy regulations (e.g., GDPR, CCPA). 
  2. Providing Timely Insights: When an Agentic system makes an error, behaves unexpectedly, or potentially violates a policy, the governance framework must be capable of generating and communicating rapid, actionable insights to human oversight teams. Traditional systems that rely on end-of-quarter reporting or manual audits are insufficient. The governance layer needs to be equipped with automated alerting, explainability tools (XAI) to interpret complex agent decisions, and mechanisms for swift intervention or system rollback. 

The ability to maintain control, ensure accountability, and build public trust in Agentic deployments hinges directly on the maturity and agility of the underlying governance infrastructure. Without robust, adaptive governance, the risk of unintended consequences, regulatory penalties, and reputational damage escalates dramatically. 

Author

Related Articles

Back to top button