AI agents have the potential to transform site reliability engineering (SRE), but without the proper context, they struggle to be truly effective in real-world environments. In observability, this lack of awareness is a critical barrier. Model Context Protocol (MCP) is designed to help solve pieces of this problem and provide guidance and context that AI agents need to assist in operating the software that developers release.

MCP provides an open mode of communication, delivering observability data and context, while enabling direct tool interaction for frontier AI models. So, how do these differ from application programming interface (API) models, and how can SRE agents leverage them for greater efficiency and accuracy?

Context vs function

MCPs are publicly available technical specifications that are developed and maintained either by software providers themselves or third parties. MCPs enable an AI agent to natively access meaningful tools, memory, and state, resulting in more specific and responsible AI output.

AI models are often siloed to knowledge learned during training or with simple tools like web searching, limiting them to solving simple, single-issue problems where data must be collected and shared explicitly by humans. To tackle more complex tasks, AI tools need to be able to leverage operational data from across the business and various applications. Making this data accessible on demand is crucial for enabling AI solutions to solve domain-specific problems and operate with autonomy. Otherwise, AI use is likely to be relegated solely to providing general knowledge to employees.

MCP moves away from solely focusing on calling remote services and instead, emphasises context-sharing. Context is the key to accurate, grounded, and functional AI agents. Deciding whether to integrate MCP with an AI-enabled use case begins with a fundamental question: Does domain-specific context or functionality matter in this case?

MCP is made for systems that require AI agents to learn, reason, and cross-reference information. Systems where understanding what just happened matters as much as what to do next. Consider AI-assisted software development, for instance: coding agents need to understand business logic, architectural constraints, application performance, and adherence to service level objectives. With APIs, agents frequently need to write code and then attempt to execute it without the native ability to simply call them. That is not to say that APIs don’t have their place–they are great for executing repeatable functionality for pre-built software.

APIs require application-specific code to be executed to make a request or transfer information. With MCP, engineers don’t have to do any programming because an AI agent can understand an MCP tool or resource on its own without custom programming. Just by using prompts, a user can easily connect multiple data sources and applications.

Today, most agentic AI manages a single task at a time. For example, if you ask Siri to perform a task on your behalf or answer a question, it responds accordingly. Ideally, Siri would not be programmed in a way that only conducts that type of task or answers a single question. Instead, it would perform multiple tasks or correlate information from a complex multi-step prompt that may require several reasoning steps. This is what MCP enables.

Similarly, at the infrastructure level, an engineer might run a big query, copy the output, and then ask AI to analyse the data. Often, there are challenges in forming the query, feeding data back into the model, or determining the best format for the model to search and retrieve the necessary information. MCP brings that level of ‘opinionation’ out of the box. Developers can leverage integrations built with MCP to serve as the connecting data source and simply ask the model to make the query and analyse the result on behalf of the user.

MCP-powered SRE

As AI becomes more embedded in authoring and operating applications, companies are already building MCP-compatible AI workflows into their release and incident response processes. This means that context is shared across tools like integrated development environments (IDEs), AI assistants, and coding models. This boosts their ability to respond to complex, multi-turn questions and prompts.

AI-enabled SRE is one of the most clear-cut use cases for MCP after code authoring, with workflows that require situational awareness, multi-agent collaboration, and real-time decision-making on data from disparate systems. In this use case, AI agents require extensive context, in addition to tools, to provide guidance, explore theories, and enable smarter decisions. When augmented with MCPs, they are well equipped to:

Investigate incidents and assist with triage, taking into account previous investigations and incident responses.
Request alert correlation analysis, identify anomalies, and relate issues to the relevant parts of the system architecture.
Semantic handoff of tasks to move beyond summarizing status.

Instead of an SRE agent providing generalist knowledge based on theoretical concepts, MCP enables agents to leverage knowledge grounded in telemetry and analysis of live environments as opposed to general knowledge and theoretical.

This is far more transformative as users can ask questions or enter more complex prompts and receive meaningful responses from the AI.

How MCPs change the game for platform leaders

While AI code editing has come a long way, the context of what to fix, where to fix it, and how that fits into the organisation’s production systems has traditionally needed to be determined by a human. Now, with the introduction of MCP, agents are empowered to analyse that necessary context. As a result, AI SRE agents can investigate observability and system event data to create highly intelligent hypotheses or conduct a root cause analysis supported by real telemetry from the environment more quickly.

To those in charge of secure remote access (SRA), platform engineering, or intelligent operations, it’s important to realise that MCP is about enhancing APIs rather than replacing them. AI agents may act more intelligently and produce more accurate results because of MCP’s dynamic context layer. However, an agent’s action or output is only as good as the information and context it has access to. MCP helps ensure the agents are grounded in real-world data. This way, the agent can identify the most likely hypothesis and get to the root of the issue more quickly.

The more high-quality MCP tools you can provide your SRE Agents, the more effective and accurate their work will be. MCP can bring together a wide range of data and information. If you have a toolbox that has all the right tools for the job, it allows you to make smarter analysis and decisions.

What to expect

Currently, you need to have a relatively good recipe for what MCP tools will augment the task you are using your AI agents for. In the future, we expect AI agents to improve when they recognize the necessary tools for specific tasks and when they require more or different sets of data to complete a task reliably. This curation is a task currently left almost entirely to the end user for the time being.

Author

AIJ Thought Leader

View all posts

AIJ Thought Leader 1 November 2025

5 minutes read

Model context protocols are the missing link for AI-driven site reliability engineering

By Rob Skillington, CTO and co-founder, Chronosphere

Context vs function

MCP-powered SRE

How MCPs change the game for platform leaders

What to expect

Author

Context vs function

MCP-powered SRE

How MCPs change the game for platform leaders

What to expect

Author

Related Articles

How ARC Raiders AI Enhances Tactical Combat

Application Scenarios of the Nano Banana AI Image Generation Model

When AI Agents Start Creating Their Own Identities: The Security Challenge No One Is Preparing For

Integrating AI for enhanced audit outcomes