DataFuture of AI

No More Guesswork: Data Semantics Stops AI Hallucinations

By Justin Borgman, the CEO & Cofounder, of Starburst

AI is being adopted at an incredibly fast pace, so fast that industry standards can’t keep up. AI is like any other new technology, it requires standards (technical or ethical) to ensure consistency and a common framework that will drive adoption and innovation. It will also create a common language that AI agents can use to communicate with each other, query data and share information. This common understanding can be achieved through the Open Semantic Interchange (OSI) – a vendor-neutral standard for semantic models that will allow businesses to accelerate their AI projects.

To put this into perspective – it will add a business semantic layer that will provide AI agents, models and applications with a shared vocabulary they can use to contextualize and define all aspects of the business and the data that underpins it. Using this vernacular AI-powered services can easily understand and interpret enterprise functions and terminologies. For example, they will be able to determine entities across databases and other data sources to identify what a customer is, work out how revenue is calculated, or what “churn risk” means. They can also examine data to work out which fields are sensitive and how they relate across different systems.

Without this semantic layer in place, AI systems are left to make their best guess, risking errors and hallucinations. With it, they can reason using the same definitions, policies and data lineage that human teams rely on. It turns fragmented data into trusted, reusable knowledge the business can depend on.

Setting new semantic standards

The OSI will help organizations standardize their fragmented data definitions to enhance AI interoperability. This is important because AI is still a nascent technology and different AI providers, like OpenAI or Anthropic for example, have developed models with their own set of native semantics. Unfortunately, this can lead to vendor lock-in and create AI data silos that have a detrimental impact on agent behaviour.

The OSI standard will prevent AI projects from developing their own language or proprietary standards that will stifle innovation. The OSI is to enterprise AI what SQL was to databases – a shared, portable contract that will unlock a new ecosystem. It will provide agents with a shared context that will stop them from guessing or hallucinating, removing the constant need for human oversight.

It’s an open-source initiative that creates a universal specification for organizations to standardize their fragmented data definitions with an open, vendor-neutral semantic model specification. Ultimately, OSI will enhance interoperability across various tools and platforms, providing enterprises with a vendor-neutral specification that provides consistent metrics and definitions across dashboards, notebooks, and machine learning models.

Implementing AI data products

The OSI standardizes how AI models and agents exchange context (e.g., definitions, metrics, entities, relationships), so they can interoperate across tools, clouds, and workflows. It adds a semantic layer that organizations can use to apply business rules and governance to AI data, helping AI agents to collaborate and stop making mistakes. Those same rules can be applied to data as it travels across the enterprise, spanning different locations, geographies and multiple data resources.

This can be achieved by embracing the concept of data products. A data product is a packaged, reusable data asset that includes comprehensive metadata, clear data lineage and domain context. A data product is typically created with a specific intent, usually also for a specific domain team. As such, a key feature of a data product is its sense of ownership, governance, and accountability.

Data products are also designed for easy discovery and consistent use. Meaning that organizations can leverage them to improve operational efficiencies by enabling access to data directly, regardless of where it resides. Data products empower business functions to help teams solve problems quickly across analytics and AI workloads. The introduction of a unified business semantic layer will create systems where AI agents can turn raw or siloed data into well-defined, query-ready assets that can be securely shared across teams and environments. In every industry, data leaders are under pressure to operationalize AI in a way that is governed, cost-effective and tied directly to business value. The next wave of enterprise AI will be led by organizations that move from experimentation to execution. Success will come to those who leverage this new semantic layer along with data products to unify their data, govern it consistently, and design for real-world action.

Delivering an AI data infrastructure

The open-source community behind the OSI initiative is made up of some of the world’s biggest names in data analytics, CRM and business intelligence (BI). Together, these companies have created a partner ecosystem that is paving the way for a more open, interoperable, and intelligent data landscape. It has led to a transparent, community-driven standard that simplifies data operations, unlocks new possibilities for innovation, and gives organizations the flexibility and efficiency they need to build a future-ready data infrastructure.

However, AI’s gain could also be seen as BI’s missed opportunity. Although the OSI represents a breakthrough for AI, it has highlighted the lack of a universal semantic model for traditional BI and data analytics. This could have been easily applied to BI and analytics in the past, acting as an intermediary between databases and user applications.

Nevertheless, by facilitating seamless semantic metadata exchange, the OSI initiative will accelerate the adoption of AI and BI tools to streamline operations and reduce complexity. This will allow organizations to unify their data definitions and lead to more comprehensive data analysis and data product sharing to fuel AI innovation. It also means that organizations can adopt new AI models without rebuilding their entire semantic layer.

Author

Related Articles

Back to top button