Abstract
This paper explores the architecture and functioning of Large Language Model (LLM) agents, focusing on their essential building blocks—prompts, data, tools/APIs, and memory. It examines how agentic AI systems operate in enterprise settings and highlights the scalability challenges they face, particularly the N x M problem. The paper introduces the Model Context Protocol (MCP) and Agent-to-Agent (A2A) protocols as solutions to these scalability issues, detailing their roles in standardizing communication and enabling efficient agent collaboration. Additionally, the paper addresses key implementation challenges, including tools/API orchestration, memory management, security, and evaluation, offering practical strategies to mitigate these hurdles. Through this analysis, the paper provides guidance for building robust, scalable, and enterprise-ready agentic AI systems.
What is Agentic AI
It is an AI system that autonomously makes decisions and takes actions to achieve a goal without being told exactly what to do at every step. Hence the ability to reason and act (a.k.a. as ReAct) makes agentic AI unique.
Most LLM-based applications follow a control path. In non-agentic applications, the developer (through code) defines the control flow, whereas in agentic applications, the LLM itself determines the control flow.
CoT vs RAG vs Agent
Let’s compare a few popular AI implementation methodologies with Agentic AI.
CoT or Chain of Thoughts is a way of making AI think step by step. Instead of jumping directly to the final answer, the AI breaks the problem down into smaller steps — like showing its reasoning process.
RAG or Retrieval Augmented Generation is like giving an AI a “cheat sheet” before it answers. It retrieves (fetches facts from the cheat sheet) and then generates (AI writing the answer).
The below table illustrates the differences based on 4 key capabilities (Reason, Plan, Retrieve, Proactive) –
Agent Building Blocks
Now that we have a clear understanding of agentic AI, let’s dive deeper into the building blocks—or the anatomy—of an agent. While an agent is powered by an LLM in the background, it consists of four key components: Prompt, Data, Tools/APIs, and Memory.
- Prompt – Any textual input that defines persona, intent, constraint and goals of the agent to steer its reasoning behavior
- Data – It refers to the information i.e., facts, contexts, signals and trends that the agent accesses, which may be structured (DBs), semi-structured (JSON, CSV, XML) or unstructured (text, documents, videos, audios)
- Tools/API – These are external services or functions the agent can invoke to perform actions, retrieve up-to-date information, or complete user workflows.
- Memory – It is persistent or session-specific storage that allows the agent to remember context, facts, and preferences across interactions.
Let’s map these building blocks to the 4 critical capabilities (Reason, Plan, Retrieve, Proactive) of an agent,
Agent in Action
Now let’s take a use case and see how an agentic application will function and what challenge it will face.
Use Case:
Consider a scenario where detailed data engineering requirements are documented in Confluence (Atlassian). An agent can read these documents, interpret the requirements, and automatically generate the necessary SQL queries and ETL jobs. It then publishes the code to GitHub and executes the jobs on data stored in Databricks, ultimately producing the final output—such as an aggregated table.
As we can see there are 3 main external entities that the agent needs to interact with – Atlassian, Github and Databricks. And each entity will have their own data and tool (one or more). The system prompt and the memory are specific to the LLM agent and hence, let’s keep it coupled with the agent itself.
The Challenge:
In the above example, the application needs to establish 6 unique connections (2 connections one for data and another for tool with each of the 3 entities). In practical scenario, it may have to establish more than 6 connections, depending on the no. of tools and data sources. Every new data source requires its own custom implementation, making truly connected systems difficult to scale. This leads to the classic N x M problem.
MCP (Model Context Protocol) addresses this challenge. MCP framework was introduced by Anthropic in November 2024. More on MCP can be found here.
Agent + Model Context Protocol (MCP)
What is MCP?
MCP (Model Context Protocol) is like a universal connector that lets LLM (like ChatGPT) talk to different tools, apps, or data sources. It standardizes how LLM applications communicate with data sources, tools and apps. The result is a simpler, more reliable way to give AI systems access to the data they need.
Let’s revisit our previous example and see how MCP fits into the context (refer picture below)
There are 3 key things that we need to understand in the MCP framework, 1) MCP Host, 2) MCP Client, 3) MCP Serve.
MCP Host
The MCP host is the platform or environment where the AI-powered application runs. It’s the user-facing workspace that interprets and initiates requests to MCP servers. Examples include AI assistants (such as Claude Desktop), chatbots, IDEs, or other tools that interact with users.
MCP Client
The MCP client is a component running inside the host application. It manages the connection to one or more MCP servers, handling session state and relaying requests from the host to the server and returning responses from server to host.
MCP Server
The MCP server is a modular and task-specific service that exposes capabilities to the AI (via the client) in a standardized way. These capabilities may include access to files, databases, APIs, tools, or prompts. It executes tasks (data queries, actions) and provides structured results to the client.
Agents + Agent-to-Agent Protocol (A2A)
A2A (Agent-to-Agent) protocol is a way for multiple AI agents to communicate, coordinate, and share information with each other in a structured and reliable manner. It defines the rules, formats, and standards for how agents talk to each other — like a common language or handshake so they can collaborate.
While the MCP lets one agent plug into tools, A2A lets multiple agents plug into each other.
As enterprises deploy multiple agents (sales, marketing, data engineering, customer support), they need a standardized way to exchange info instead of building one-off integrations.
Let’s take example of a data engineering use case and how 3 agents work/chain together accomplish the e2e workflow –
- A Data Retrieval Agent fetches raw data.
- A Data Transformation Agent cleans and aggregates it.
- A Visualization Agent creates charts.
To better illustrate the context, let’s integrate A2A into the MCP example we discussed earlier (refer picture below). As you can see A2A enables agent collaboration across frameworks and vendors (ChatGPT and Claude models can communicate with each other!)
Now we have understood the relevance of A2A protocol, let’s explore some of the key capabilities that work together to create a seamless experience where multiple AI agents can collaborate intelligently on your behalf without creating chaos or security risks.
Capability Discovery (Agent Cards)
Think of this like a business card exchange between AI agents. When agents meet for the first time, they share “cards” that describe:
- What they can do (skills and services)
- What data they can access
- How other agents can work with them
Example: A scheduling agent’s card might say “I can book meetings, check calendars, and send reminders” while a translation agent’s card says, “I can translate between 50 languages and understand context.”
Secure Collaboration
This ensures agents can work together safely, like having proper ID checks and permissions at a workplace:
- Agents verify each other’s identity before sharing information
- They establish secure communication channels
- They respect data privacy and access controls
Example: A financial agent won’t share your bank details with just any agent that asks – it first verifies the requesting agent has proper authorization and uses encrypted communication.
Task and State Management
This is like project coordination between team members:
- Agents can break down complex tasks and assign parts to different specialists
- They track progress and know what stage each task is in
- They can pause, resume, or hand off work to other agents
Example: Planning a trip might involve a research agent finding flights, a booking agent making reservations, and a scheduling agent adding events to your calendar – all while keeping track of what’s done and what’s pending.
UX Negotiation
This is how agents figure out the best way to interact with users and each other:
- Agents can propose different ways to present information or get input
- They adapt their communication style based on user preferences
- They coordinate to avoid overwhelming users with too many interactions
Example: If you’re getting help from both a shopping agent and a budget agent, they might negotiate to present you with one combined recommendation rather than bombarding you with separate messages.
Implementation Challenges and Solutions
We’ve examined the anatomy of agentic AI systems, analyzed each core component in detail, and explored how these applications operate through MCP and A2A frameworks. Now we’ll shift to practical implementation and address the common challenges that agent developers encounter, along with their solutions. We will focus on 4 key areas – (1) Tools/API, (2) Memory, (3) Security, (4) Evaluation
Tools/API Orchestration
A) Challenge: Picking the right tool
Solutions:
- Provide LLM right context – In the system prompt, the most accurate context possible should be given to the LLM for it to select the optimal tools needed to answer a question
- Describe tools effectively – Clear description of the tool functions should be provided in the MCP server to help the LLM decide which one to use
- Tool retrieval – This is like information retrieval (RAG). Tools description can be embedded and the LLM can search that tool database selecting tools that most closely match the requirements of the prompt. Another method for tool retrieval would be selecting x candidate tools, trying them and then deciding
- Tool omission – Agents can be given option to not use any tools as some questions may not need them
B) Challenge: Mapping LLM output to a tool
Solutions:
- Prompt structure – The more structure there us in the prompt itself, the easier it will be to map
- Schema matching – Output schemas should be compatible with the subsequent tool’s input schema. An example program that enables this approach is the LangChain JsonOutputParser
- Format matching – Output formats should be compatible with subsequent format inputs. An example program that enables this approach is LangChain StructureOutputParser
Memory Management
A) Challenge: Contextual memory (short-term active conversation)
Solutions:
- Memory Window Optimization – Keep the most important recent items easily accessible while storing older stuff more efficiently.
- Example – Instead of remembering every word from a 30-minute conversation, keep the key decisions and main topics while condensing the small talk.
- Context Summarization and Abstraction – This is like taking meeting notes; instead of writing down everything word-for-word, you capture the essential points in shorter form.
- Example – A long customer support chat gets condensed to “Customer had login issues, reset password, problem solved” instead of storing the entire back-and-forth conversation.
- Attention Mechanisms with Relevance Scoring – Like a good listener who focuses on what’s most important, the AI ranks information by relevance and pays more attention to high-priority details while filtering out distractions.
- Example – In a discussion about planning a vacation, it focuses heavily on dates and destinations while giving less weight to casual comments about the weather.
B) Challenge: Persistent knowledge (long-term knowledge across sessions)
Solutions:
- Vector Databases / Embedding Stores – Persist past interactions, documents, and facts as embeddings; retrieve them when contextually relevant.
- Knowledge Graphs for Structured Memory – Store relationships (people, events, entities) in graph form for reasoning and long-term consistency.
- Personalization Profiles – Maintain user preferences, goals, and history in structured metadata (like customer profiles), separate from raw text.
- Memory Governance & Forgetting Mechanisms – Implement policies for what to retain, what to forget (e.g., aging out old memory), and ensure compliance/security.
Security and Governance
A) Challenge: Unauthorized actions
Solutions:
- Role Based Access Control (RBAC): Implement a strict permission system where each agent has clearly defined roles and can only perform actions within their authorized scope
- Action Approval Workflows with Human-in-the-Loop: Create mandatory approval gates for sensitive or high-impact actions, requiring human confirmation before execution
- Secure API Gateway with Authentication & Rate Limiting: Use a centralized gateway that validates every agent request, enforces authentication, and prevents abuse through rate limiting
B) Challenge: Prompt Injection
Solutions:
- Validation with Prompt Firewalls: Implement robust filtering systems that detect and neutralize malicious prompts before they reach the agent’s core processing.
- System Message Protection with Instruction Hierarchy: Separate system instructions from user input using clear boundaries and establish immutable core directives that cannot be overridden.
- Output Monitoring and Response Validation: Continuously monitor the agent’s responses to detect when it may have been compromised and implement safeguards to prevent harmful outputs.
C) Challenge: Compliance with org policies
Solutions:
- Policy-as-Code Implementation: Encode organizational policies directly into the agent’s decision-making process as executable rules that are checked before every action.
- Monitoring with Audit Trails: Create comprehensive logging and monitoring systems that track all agent actions against policy requirements and generate compliance reports.
- Policy Training and Context Injection: Embed organizational policies directly into the agent’s knowledge base and ensure they’re consistently referenced during decision-making.
Evaluation
A) Challenge: Agent performance
Solutions:
- Simulation Environment: Track real-world performance with monitoring tools (latency, failure rates, hallucinations)
- Robust Prompt & Response Validation: Design structured prompts (with clear instructions, constraints, and examples) and use guardrails to validate model outputs (e.g., type checking, regex validation, schema enforcement).
B) Challenge: Edge case failures
Solutions:
- Test across edge cases: Validate against invalid inputs, missing data, ambiguous instructions
- Fallback & Recovery Mechanism: Always define fallback paths when the agent fails or is uncertain.
- Human-in-the-Loop: There will always be scenarios that can’t be caught by systems; don’t forget to keep human in the loop for check and balance!
Conclusion
Success with LLMs isn’t about making the most complex system—it’s about building the right one for your needs. Start simple with prompts, test and improve them, and only add multi-step agent workflows if simpler solutions don’t work.
When building agents, focus on key principles:
- Invest in your data infrastructure, organize structured, semi-structured and unstructured data
- Plan system integrations for agent-computer interface (ACI),
- L0: software/database/APIs
- L1: agents
- L2: interface (user)Plan system integrations for agent-computer interface (ACI),
- Identify the right use cases that will have material impact for your org (ROI)
- Keep the agentic design simple
- Make the process transparent by showing how the agent plans its steps
Frameworks are useful to get started, but as you move to production, it often helps to reduce complexity and work with basic components. Following these principles leads to agents that are powerful, reliable, easy to maintain, and trusted by users.