Future of AIAI & Technology

Dodging the trap of false AI insights

By George Tziahanas, VP of Compliance, Archive360

False AI insightsย areย becoming anย increasingly urgent challenge as enterprises ramp up their use of generative tools.ย ย 

GenAIย is becoming part of the daily run-of-the-mill operation in many organisations โ€“โ€ฏbut evenย forย personal usage, the risk of inaccuracy inย itsย answers isย well-known. Particularly in LLMs, which have been trained to deliver extremely high-quality writing rather than necessarily being trained in the nuances of anyย particular field.ย Itโ€™sย not uncommonย for major public LLMs to deliver an answer that seems right to its language algorithm, without necessarily deducing thatย itโ€™sย not, in fact, true.ย In the worst cases, it can be hard to decipher why the AIย came to the conclusionย it did, beyond the answer sounding like the sort of thing that might follow the question asked!ย 

Poor data, poor answersย 

Whileย this sort ofย AI โ€˜hallucinationโ€™ย isย aย familiarย phenomenon, a quieter but equallyย serious problemย is AI systems pulling from outdated, incomplete, or inaccurate data. For example,ย you mightย ask an AIย modelย toย identifyย symptoms of a medical condition andย unknowinglyย receiveย an answer based on a 50-year-old paper instead of current research.โ€ฏIf youย donโ€™tย have complete visibility over the sources AI is drawing from โ€“โ€ฏand, crucially,ย howย itโ€™sย processing them โ€“ this kind of inaccuracy becomes not just possible, but likely.ย 

In these cases, the problem is more insidious โ€“ the answer might not just sound right linguistically, but also logically: outdated information was correct at some point, even ifย itโ€™sย not fullyย accurateย now. And that makes it harder to tell that an error has occurred. A large part of the benefit ofย GenAIย is its ability to relieve people of the need for long-form information processing โ€“ but if its answersย canโ€™tย be trusted unlessย theyโ€™reย subject to thorough human checks, in a senseย weโ€™reย back to square one.ย 

In short, asย more companies integrate AI into business-critical processes, the risk of drawing false conclusions from poorly governed dataย is only growing. Solving thisย issueย requires real control over training and inference dataย โ€“ something many enterprises still lack.โ€ฏโ€ฏย 

Why โ€˜data quality debtโ€™ is a major risk inย AIโ€ฏdevelopmentโ€ฏโ€ฏย 

Theย GenAIย boom is accelerating at unprecedented speed, leaving many leaders feeling they need to scramble to ensure their organisationย doesnโ€™tย fall behind.ย Much of the investment that follows is often directed towards the creation of AI models themselves โ€“ the processing algorithms that turn data into insights. Much less, however, is directed towards the data foundation itself.ย 

Organisationsย looking to generate effective and helpful insights from AI will need to be confidentย their AI tools canย ingest governed data from acrossย all enterprise applications, modern communications, and legacy ERP.ย Not onlyย willย this helpย organisationsย toย realiseย value faster-ย but it alsoย reduces the risk AI can pose toย businessesย by inadvertently exposing regulated data, company trade secrets, or simply ingesting faulty and irrelevant data- as in the scenario above.ย 

The importance of feedingย AIโ€ฏsystems compliant, current, and contextually relevant data cannot be overstatedย and those organisations able to shift from application-centric to data-centricย managementย will bear the fruits of AIย fasterย –ย being ableย to extract valuable insightsย which areย accurateย and compliant.ย ย 

Five steps toย visibility and governance over training and retrieval pipelinesโ€ฏโ€ฏย 

In order toย achieve the level of data qualityย required, there are five key areas organisations need to master.ย 

  1. Data lineage andโ€ฏprovenance

This includesย maintainingย a record of the source of the data, its origin, ownership, and any changes in metadata (ifย permitted) throughout its life cycle within our platform.โ€ฏItย also meansย maintainingย rich metadata and all the underlying documents or artifacts from which it is derived. This providesย theย necessary transparency, especially across large volumes andย timeframes.โ€ฏย 

  1. Data authenticity

This requiresย maintainingย a clear chain of custody for all data, storing objects in their native forms, and hashing objects received toย demonstrateย dataย remainsย unchanged.โ€ฏIn addition,ย organisations mustย maintainย a full audit history for each object, and for all actions with respect to changes in policies and controls.โ€ฏThis means analyticsย teamsย can beย certain the dataย remainsย in its original form.โ€ฏย 

  1. Data classification

Establishing the nature of a set or type of data is important since itย speeds upย AI training.ย Organisationsย needย toย be able toย govern structured data, semi-structured data, and structured sets of data.ย Giving eachย class a unique schemaย canย allow organisations to manage diverse sets of data without a one-size-fits-all fixed ontology, whichย makes publishing data to analytics and AI solutions more effectiveโ€ฏโ€“ avoidingย the data being unnecessarily manipulated to force it into an inflexible data structure.โ€ฏย 

  1. Data normalisation

Establishing common definitions and formats of metadata is important for use in analytics and AI solutions.โ€ฏClearly defined schemasย areย an important element, along with tools that can transform or map data toย maintainย consistent, normalised views of related data.โ€ฏย 

  1. ย Data entitlements

Controlling access and use of data is critical to delivering the right data for analytics and AI solutions.โ€ฏEnterprisesย needย to be able toย entitle data in as granular a manner as needed, including at an object or field level, based on user or system profiles.โ€ฏ

This means the right data is available to users and systems who are entitled to access it, while restricting or limiting access to those who are notย โ€“ stopping, for example, a diagnostic AI accessing files meant for a research model.ย 

Looking aheadย 

Asย GenAIย becomes more deeply embedded in enterprise operations, the stakes for data governance are rising fast. The reliability of AI insights depends not only on the sophistication of the models themselves but โ€“ crucially โ€“ on the quality, traceability, and control of the dataย theyโ€™reย trained on and retrieve from. Without clear oversight of these pipelines, organisations risk undermining the very efficiency gains theyย seek. Getting AI right starts with getting data right โ€“ andย thatโ€™sย a challenge no enterprise can afford to overlook.ย 

Author

Related Articles

Back to top button