Amid all the brouhaha over DeepSeek, its potential overshadowing of ChatGPT, and which AI assistant is better than the next, what gets lost in the shuffle is AI’s ultimate purpose. Everyone talks about accuracy, processing power and other features, but really every AI assistant – and AI in general – does pretty much the same thing. The machine automates some piece of a process – a piece of the thinking – to assist human beings who are involved with the process. That’s the core promise of AI.
It’s the same whether it’s AI for code generation, for predictions, for image or headline creation, optimization with a/b testing, or – as this article explores, using AI to improve customer experience (CX). They are all tasks that humans thought of and have been doing for a long time. AI simply promises to scale up automation by effectively taking a set of parameters – a description of the task, the process, the outcome – and iterating to create what people are asking for.
How then does one AI system do that better than another? A lot depends on the underlying data that is used to train AI, with the catch that the results produced can mask the quality – or lack of quality – in the underlying data. There are three general classes of data used to feed AI – each with its own set of problems that occur when the data is no good.
Training Data for the AI Knowledge Base
One is the training data itself. Any AI object – a large language model (LLM), an agent, etc. has a ton of data – the “corpus” or knowledge base – thrown at it before it’s told what to look for in the construct of the given parameters. A problem that sometimes occurs is that its thinking is skewed by the dataset it’s provided. In agriculture, for instance, AI is often used to optimize resource allocation, or for targeted interventions.
But a bias toward high-yielding varieties, or identifying pests and disease prevalent in a specific area can lead to the AI giving misinformation in certain regions, or discounting local adaptability. The way to combat this unintentional bias is to ensure an adequate cross-section of input data – an accurate, comprehensive, and neutral corpus.
Prompt Data: Proceed this Way
A second type of data is the prompt data, the information an AI model is provided to tell it how to proceed, whether a single prompt, an entire conversation, or other people’s conversations (i.e., agentic AI). Over time, AI will train itself on prompt data, becoming more intelligent about its responses based on the entirety of the conversations. As with training data, high-quality, relevant output depends on guiding the AI model with high-quality relevant data. There are many examples of “prompt-busting,” where bad actors intentionally feed AI bad data through conversation intended to “confuse” the model.
Quantitative Data: AI for Exploration
Third, many modern AI agents operate via APIs and interfaces that look at quantitative data, such as asking the model to return information about a segment (e.g., 40% of customers are in the top 10% for lifetime value). Exploration, visualization, and narrative on large datasets depends greatly on having high-quality, accurate data that has been cleansed, de-duplicated, and matched and merged using advanced identity resolution. Inaccurate data introduces bias in a myriad of ways, such as call center agents entering a default value (e.g., “00000” for a Zip code) for a required field, leading to a misunderstanding of demographic data with a cascading result of inferior customer experiences.
For all three classes of data underpinning AI, customer data technology that makes data ready for business use is critical to ensure that outputs can be trusted to produce relevant customer experiences (CX). A recent Gartner survey shows that 63% of organizations do not have or are unsure if they have the right data management practices for AI. Gartner predicts that through 2026, organizations will abandon 60% of AI projects that are unsupported by AI-ready data.
Is Your Data Ready for AI?
Because high-quality data is such a key part of using AI to drive CX, as the role of AI becomes more substantial, we are seeing a larger focus on data governance, oversight, and compliance. For example, honoring customer permissions for how their data is used – including PII data – in AI training models becomes important. There are a host of privacy requirements to consider and implement for how customer data may be used to train AI models. Governance is also closely related to how to prevent bias, i.e., how does a business ensure that AI is training on the right datasets, that it’s not hallucinating, that bad actors are not intentionally feeding it bad data to produce inaccurate results – or perhaps to steal proprietary information?
In summary, there are many limitations and dangers implicit with using AI to enhance CX. Given that enterprise businesses are feeding AI with detailed information about their customers, care must be taken to fiercely protect customer data and prevent bad data from causing AI catastrophes – intentional or accidental – that rather than improve CX will instead have the opposite effect, introducing CX friction that may damage the customer relationship or introduce financial and reputational damages.