Future of AIAIAgentic

Data Strategy Wizardry to Unlock the Future of Agentic AI

By Madhu Koduvalli, Content Marketing Manager of AI at Invisible Technologies

Imagine you’re tasked with planning a complex international trip for a large team. Some colleagues are flying in from London, others from San Francisco, and even Australia. You need to book flights and hotels, schedule important client meetings, and find a dinner reservation for fourteen people in bustling Manhattan, accommodating various dietary restrictions like vegetarian, gluten-free, and an aversion to mushrooms. This alone sounds like a challenge. 

Now, picture this scenario escalating rapidly due to an unforeseen event, like a natural disaster. Suddenly, flights are canceled, triggering a cascade of frustrating tasks: rescheduling flights for disgruntled colleagues demanding upgrades, rebooking hotels, drafting apology emails to clients, and scrambling to find another large restaurant on short notice. Doesn’t that sound like a major headache? 

Now picture an AI agent capable of adapting to this kind of chaos. It could potentially rebook flights, draft emails to clients, and even send them with permission. It could find alternative restaurants and interact with your colleagues, handling many of these annoying tasks for you. This capability to act and adapt is a core characteristic of Agentic AI. Instead of just looking up options, an AI agent could potentially take action, rebooking flights and hotels, and even finding routes to safety for stranded team mates. 

Is this kind of agentic solution possible, and can businesses ever get to the point where they can trust it to act on behalf of employees in the midst of already chaotic situations? 

Peeking behind the curtain of agentic AI 

To many, this sounds almost miraculous, like a magical wizard that solves problems. For businesses that are just getting a handle on getting their language models to stop hallucinating, such a solution can seem like an impossible, intangible goal, fraught with security implications.  

The path to agentic AI might feel like following a yellow brick road to this miraculous wizard—but where does it truly lead? 

When you peek behind the curtain of what might seem like an Agentic AI “wizard,” will you actually find an autonomous entity, or is it a human empowered by the right tools? Could it be as simple as a chatbot with limited responses, or perhaps something more complex, like a dedicated individual managing numerous spreadsheets?  

An agentic AI solution could be: 

  • A complex system comprising dozens or hundreds of models working together 
  • A language model built with Retrieval-Augmented Generation (RAG) capabilities 
  • A language model with process automation on the backend 

It’s important to recognize that not all agentic AI solutions are built the same way, and in many cases, the best solution to a problem might simply be a human being. 

Building a data strategy for agentic AI 

Understanding the business requirements and specific problems that the AI agent is being built to solve is a crucial starting point, because this will directly influence what’s “behind the curtain.” It also dictates the data strategy needs of the model development project. 

Each agentic application requires a customized data strategy to bring it to life. Data forms the very “bricks” of that yellow brick road to Agentic AI.  

However, the data landscape can be confusing. Researchers will need to decide: 

  • How much data they need 
  • Whether proprietary data is necessary and how it can be sourced and used securely for training 
  • Whether unstructured data can be used; and if not, what kind of structure is needed 
  • If supplementary data is required, should it be synthetic or human-generated 
  • What data is needed for evaluating and benchmarking the agentic model as it progresses in training 

Fortunately, this complex ecosystem can be navigated with a structured approach. A data strategy for Agentic AI can often be broken down into a six-step program: 

Step 1: Define requirements 

The journey begins with achieving clarity on what the agent needs to accomplish. This involves specifying its capabilities:  

  • Does it need to process images or audio? 
  • Does it need to interact with databases or websites? Or both? 
  • Crucially, does it need to act autonomously on behalf of a user?  

Clarifying the level of autonomy is vital—for instance, should the agent ask for confirmation before sending an email or simply send it? 

Step 2: Create a data collection strategy 

Once the requirements are defined, researchers must pull together a “grocery list” of the data they need, highlighting the types of data required to bridge the gap between the model’s current capabilities and the desired agent.  

This step includes developing a strategy for collecting this data. Data collection challenges typically include determining how to leverage existing datasets, such as millions of customer service conversations, and deciding between human or synthetic data.  

If human data is needed, finding partners with the necessary expertise who can be trusted with sensitive information is key, especially in domains like financial services where subject matter expertise is critical for tasks like advising on trades. 

Step 3: Prepare your data 

Preparing data for ingestion into an AI ecosystem is critical and often challenging. Quality should be prioritized over quantity. The goal is for each piece of data to have the maximum impact on the agent’s learning.  

For example, if your team is using customer service conversations as training data, simply feeding thousands of raw conversations into a model might teach it undesirable behaviors. A better approach might be having humans curate the best conversations—or, even better, using them to augment the data with reinforcement learning from human feedback (RLHF) components, such as rating each conversation on a scale where five is ideal and one is very poor. This method can effectively teach the agentic model what both good and bad looks like. 

Step 4: Validate your data 

Before committing to a large-scale data collection strategy, validation is paramount. It’s better to start small and test a small set of data early in the process, as agentic models can exhibit unexpected behaviors early on. Validating with a small dataset ensures the data elicits the desired behavior before significant resources are invested in collecting much larger volumes. Validation provides an opportunity to est the training data guidelines and approach. 

Step 5: Scale your dataset 

Once your team has validated that your data strategy is effective and elicits the desired behavior, the next step is scaling up data collection. The goal here is to collect or create enough data to handle the wide variety of interactions and inputs the agent will encounter from diverse users.  

For instance, if your team is building an agent that helps coordinate travel, can the agent rebook flights only if the request is phrased in a very specific way, or can it handle natural language? Can it process flight information typed into the system? Can it extract details from an image of a receipt?  

Expanding the dataset to cover diverse scenarios is essential before launching the agent within your organization or to the public. 

Step 6: Deploy and monitor your agentic AI 

Many leaders mistake the successful deployment of an agentic AI application as the only goal, but continuous monitoring and maintenance is just as important for successfully leveraging any generative AI solution. 

Monitoring allows organizations to identify where issues or breaks occur within the AI ecosystem in real time. This is particularly essential after deployment because:  

  • Models can forget what they’ve learned almost as readily as they learn them 
  • Natural language evolves over time, so the way users interact with the agent can change 
  • New experiences and interactions might compel the model to adapt in ways that are ultimately detrimental to users or the organization 

Continuous monitoring provides a feedback loop to catch performance degradation or new unexpected behaviors after deployment. 

Accuracy defines data quality: A case study 

An agentic AI system is often like a chain or network of models working together, so they all need to function correctly for the user to have a positive experience.  

Consider this real-world example, where researchers were building an agent to book flights. Trainers interacting with the model observed that it often hallucinated flight details like destinations, times, and seat numbers. Imagine being the passenger on the receiving end of the flight tool expecting to book a first-class flight home for the holidays at ten in the morning, only to end up with a red-eye flight to Antarctica in seat 35D.  

This is the kind of inaccuracy that quickly erodes user trust. Researchers may initially hypothesize that fixing such an issue requires massive data expenditure. However, by carefully examining the model’s interactions from the start and the data it uses when calling external functions (like databases or websites), the root cause can be diagnosed early—and far easier to fix.  

In one instance, the issue was a surprisingly small detail: the API schema the model used to call an external tool was missing the “airport ID” field. Airports have unique numerical IDs that are foundational to their operational data. This missing detail early in the process caused a ripple effect of bad outcomes later on, leading the user to see non-existent flight information. This highlights how crucial seemingly minor data points can be in complex agentic AI systems. 

Ultimately, the journey to agentic AI requires acknowledging that it is a tangible, achievable goal, often involving humans empowered by technology, rather than a magical solution. The first step is understanding where you are on this journey. Once that is clear, organizations can invest in and prescribe the correct data strategy tailored to their needs, ensuring they drive the right outcomes for their users within the right budget. Building this path requires using the right data to create an agentic AI solution that is effective and beneficial for your organization. 

Author

Related Articles

Back to top button