AI agents are showing up in production environments everywhere. According to PwC’s 2025 AI Agent Survey, 79% of the companies surveyed have already begun utilizing AI agents. And as Docker’s 2026 State of Agentic AI report confirms, 94% of organizations view building agents as a strategic priority. But there is a problem.

The Belitsoft international AI automation consulting and software development company with an office in North America, reports that companies are building agents but are hesitant to put them into production because they cannot see what the agents are actually doing. According to a 2026 Monte Carlo poll of AI engineering leaders, 73% of companies won’t ship an AI agent without monitoring and alerting. However, 63.4% say that the lack of monitoring and observability is the biggest reason why AI agents aren’t being used more widely.

AgentOps is what you need here. You could say it’s DevOps for AI agents. It includes deployment, monitoring, governance, and optimization for all the places agents run, such as public clouds, private data centers, edge devices, serverless platforms, or a mix of these. Docker’s State of Agentic AI research found that 79% of people who answered the survey use agents in more than one setting. For instance, 51% use them in public clouds, 40% on their own servers, and 32% on platforms that don’t have servers. Without a clear plan, this is a quick way to get things out of control and get bills you didn’t expect.

Why AgentOps Matters (And Why You Can’t Ignore It)

The Visibility Problem

AI agents make decisions, use tools, and write content in ways that normal observability tools weren’t meant to handle. According to Monte Carlo’s study, 53% of businesses plan to make big changes to or completely rebuild the AI agent systems they already have. More than half of today’s production agents will need to be redone because they can’t be seen in real time.

When agents talk to each other, the problem gets worse. New Relic says that agents depend on each other’s outputs from several MCP servers, shared context, and memory, which makes it hard to debug the complex webs they make. One agent’s hallucination can mess up another agent several steps down the line, and it’s almost impossible to find out what’s causing the problem without special tools.

The Need for Good Governance

Visibility is only half the battle. You need to be in charge too. The EU AI Act and ISO 42001 already say that AI systems must have audit trails. IBM’s AgentOps framework for watsonx Orchestrate keeps track of decisions, finds mistakes, and makes sure that policies are followed throughout the entire agent lifecycle. Monte Carlo’s survey found that companies need to keep an eye on secure data handling (68% of respondents), make sure everyone knows what is expected of them (62.7%), and set up alerts for failures (72.7%).

Gartner says that by 2028, businesses will use thousands of AI agents in all areas of their operations. Policy enforcement, centralized management, and value measurement will be important. Without it, you get “AI sprawl” – dozens of separate initiatives across different teams, tools, and clouds with no centralized visibility. Gartner says that by 2028, this will have an effect on 40% of Fortune 1000 companies.

The Multi-Environment Reality Check

Your Agents Are Already Everywhere

Agents do not live in one neat place. Docker’s research shows that 61% of organizations combine cloud-hosted and local models. 46% of agents say they use between four and six models, while only 2% say they use just one. The multi-model, multi-cloud setup is already the norm.

The 2026 Edge AI Survey by ZEDEDA, which included 600 IT and business leaders from the U.S. and Germany, found that 47% of companies have moved to hybrid cloud-edge architectures. But 41% say it’s hard to manage AI workloads in different settings. There is still a big difference between training a model in one place and running it reliably at the edge.

The Hyperscaler Landscape

Every big cloud provider has its own agent service. AWS Bedrock AgentCore is very closely linked to AWS. Azure AI Foundry works very well with Microsoft 365. Google Vertex AI Agent Builder is the third option. But none of them let you switch between clouds, use native CI/CD for agents, versioning, rollback, or canary deployments. According to Xpander.ai, if you pick a hyperscaler, you have to accept cloud lock-in as a design constraint.

The Container Reality

Containers are the backbone of agent deployments. According to Docker’s State of Agentic AI research, 94% of companies employ containers in their production or agent development processes. 40% of teams that use Docker to build agents use Docker Compose as their orchestration layer. Containers give you a consistent runtime across environments, which is exactly what you need when you deploy agents that might run on AWS one day and on-prem the next.

The AgentOps Toolkit: What You Actually Need

Agent Management Platforms

Several platforms launched in 2025 and 2026 to handle multi-environment governance.

In March 2026, Kore.ai Agent Management Platform (AMP) was released as a single command center for overseeing, controlling, and managing AI agents across frameworks, clouds, and development environments. It can manage LangGraph, CrewAI, AutoGen (which is now in maintenance mode), Google ADK, AWS AgentCore, Microsoft Foundry, Salesforce Agentforce, and its own systems from a single control plane. It has an evaluation studio that lets you test how agents act before you send them out.

Domino Data Lab introduced its Winter Release in February 2026, which it calls the first fully governed end-to-end platform for making agentic AI systems work. The platform offers an agentic development lifecycle (ADLC) experience that includes the stages of building, evaluating, deploying, and monitoring, with full lineage, reproducibility, and governance. Domino’s universal tracing SDK works with any agentic orchestration framework.

In April 2026, Salesforce Agent Fabric grew to include a centralized control plane for managing and governing multi-vendor AI environments. The update added automated discovery tools, such as expanded Agent Scanners for Amazon Bedrock and Microsoft Foundry. It also added a Visual Authoring Canvas for mapping workflows and guided determinism through Agent Script for Agent Broker.

xpander.ai positions itself as a cloud-agnostic full-lifecycle agent platform. It can work on any Kubernetes cluster in a private VPC, AWS, Azure, or GCP. It also has native self-hosted and air-gapped deployment options. The platform supports no-code, low-code, and code-first build paths, dynamic non-linear orchestration, and full lifecycle management, which includes versioning, rollback, and canary deployments.

Tools for Observability

Agent sessions and LLM conversations are now seen as the best telemetry signals in the business.

Grafana Cloud AI Observability was released in April 2026 and is now available for public preview. It connects agents to traces, tool calls, token usage, costs, and live evaluations, all in one place where you can see what’s going on in the system. It works with OpenTelemetry and keeps track of the versions of the agents on its own. Grafana also made o11y-bench, a free and open-source framework for checking how well agents work.

Monte Carlo Agent Observability gives you a single view of four areas: context, performance, behavior, and outputs. It lets you check AI-generated fields directly against source data to find mistakes and hallucinations before they affect other systems.

Salesforce Agentforce 360 is a set of tools for keeping an eye on agent performance, health, and optimization across the Salesforce ecosystem. It does this by keeping track of usage, tracing session flows, and making sure everything works.

In November 2025, Dynatrace and Amazon Bedrock AgentCore worked together to collect detailed telemetry and turn it into useful information through a live topology map and smart alerts.

With the Agents Service Map, New Relic Agentic AI Monitoring shows you every agent and tool call in a multi-agent collaboration and how they interact with each other. New Relic also released an MCP Server that lets assistants like GitHub Copilot, ChatGPT, Claude, and Cursor get observability data directly.

As part of Cisco’s AgenticOps vision, Splunk AI Agent Monitoring monitors cost and token consumption metrics in addition to quality and security measures including hallucinations, bias, drift, and accuracy, as well as performance metrics like latency and mistakes.

The Datadog MCP Server gives AI agents access to logs, metrics, and traces in real time while keeping the same security, governance, and audit controls in place.

IBM AgentOps for watsonx Orchestrate watches what agents do in real time, keeping track of their choices, finding mistakes, and making sure that policies are followed at every stage of the process.

Agent Development Frameworks

The framework you choose affects how easily you can deploy across multiple environments.

LangGraph is based on LangChain. Independent studies from 2026 show that LangGraph has a low latency for LLM calls and uses an average of 1.2GB of memory. LangChain has more than 500 integrations, and its abstraction layers let you change providers without having to rewrite the code for your agents.

CrewAI lets you quickly build prototypes (in less than three hours) and gives developers a good experience with role-based agents. Deloitte case studies show that 89% of the time, the projects worked, and they cost about $0.12 per query.

Microsoft Agent Framework (MAF) is Microsoft’s forward-looking framework, consolidating capabilities from AutoGen and Semantic Kernel, both of which entered maintenance mode in late 2025. MAF is designed for enterprise-scale multi-agent coordination with deep integration into the Microsoft ecosystem.

For tasks that need data to be done, LlamaIndex is an expert in RAG and multi-source querying.

PydanticAI uses Pydantic models to check and sort responses to make sure that the outputs are safe for their types. It supports various models, evaluations, tool approvals, and workflows that can run for a long time and be restored.

Infrastructure and Deployment Tools

Nutanix Agentic AI (NAI) is a full-stack software solution announced in March 2026. It makes things less complicated, works better, and is more secure. It also works with the NVIDIA Nemotron family of open-source models and NVIDIA AI Enterprise.

Red Hat AI Enterprise, which came out in February 2026, is a single platform based on Red Hat OpenShift that lets you deploy and manage models, agents, and apps in hybrid cloud environments. Version 3.3 adds support for models including Mistral-Large-3, Nemotron-Nano, Apertus-8B-Instruct, Ministral 3, and DeepSeek-V3.2.

The ZEDEDA Edge Intelligence Platform, which was announced at NVIDIA GTC 2026, is a single tool for making, deploying, securing, and running AI on a large scale in many different edge environments.

In February 2026, Vast Data released Polaris, a global control plane for AI data infrastructure that works with both cloud and datacenter deployments. It lets businesses control VAST clusters on any cloud as if they were one system, and it has a zero-trust framework for agentic AI.

Cloudera Agent Studio uses NVIDIA NIM and the NVIDIA Nemotron family of models to manage self-driving workflows. It does this by using iterative multi-step planning and multi-agent collaboration.

LLMOps and Tools That Help

OpenLLMetry is built on OpenTelemetry and keeps track of traces, prompts, completions, and token usage. You can see LLM in already set up logging and metrics pipelines.

Bifrost lets you connect to over 20 providers, such as OpenAI, Anthropic, AWS Bedrock, Google Vertex, and Azure, all through one API. It has caching, load balancing, and failover built in.

Promptfoo is a free, open-source tool that lets you run evaluations and red-teaming in CI/CD pipelines. OpenAI bought it in March 2026, and it is now part of OpenAI Frontier.

Composio is a service that links AI agents and LLMs to more than 250 other apps and services. It handles authentication and works with many different agentic frameworks.

A Useful Plan for Deploying Multi-Environment AgentOps

Based on what works today, here is a step-by-step plan.

Step 1: Understand that multi-environment is your default. Docker says that 79% of businesses work in more than one environment. Make plans for a hybrid system from the start. Choose frameworks (like LangGraph and CrewAI) and deployment platforms (like containers and Kubernetes) that let you work with more than one vendor.

Step 2: Containerize everything. With 94% of teams using containers for agent workloads, containerization is a must. Package agents as images, use Docker Compose on your own computer, and then move to Kubernetes in production.

Step 3: Set up tools for monitoring from the start. The three pillars of agent observability are system metrics (latency, error rate, tokens per task), quality metrics (task success rate, trajectory adherence), and cost tracking. Use OpenTelemetry-compatible instrumentation like OpenLLMetry to send telemetry to platforms like Grafana Cloud.

Step 4: Set up a centralized management plane. Using Kore. ai AMP, Domino, or Salesforce Agent Fabric lets you see everything in one place. This is needed to stop sprawl.

Step 5: Set up guardrails that are specific to the environment. Edge deployments (through ZEDEDA) need to be able to deal with connections that come and go. For on-premises deployments, data residency compliance is necessary. Cloud deployments need to keep an eye on costs so they don’t get too high. Put these rules into your management plane.

Step 6: Set up a way to keep evaluating. Agents get worse as time goes on. Before deploying, check for drift, hallucination, and bias using Domino’s evaluation tools, Monte Carlo’s data validation, or Promptfoo in CI/CD.

Step 7: Make it easy to see costs. Autonomous agents can make hundreds of API calls that aren’t needed and cost a lot of money in the cloud. Splunk’s AI Agent Monitoring keeps an eye on costs and token usage. Set limits on how much you can spend and make sure that cost is a clear measure before you deploy.

Step 8: Plan for failure recovery. Agents fail. Multi-agent systems fail in more interesting ways. Use recovery and checkpointing tools. With xpander. ai, you can run stateful tasks for a long time, stop them so a person can look them over, and then pick up where you left off without losing state.

Real-World Challenges and How to Tackle Them

The Sprawl Problem

Organizations can quickly end up with 50-200 agents, each with its own infrastructure and monitoring. Centralized governance via Kore. ai AMP or Domino is the only scalable solution. Gartner says that by 2028, the average Global Fortune 500 company will have more than 150,000 agents working for it.

The Cost Problem

A cloud cost optimization agent could look through your AWS and Azure accounts, make hundreds of API calls that aren’t necessary, and raise your cloud bill. Build spend guardrails before deployment.

The Security Concern

According to Docker’s State of Agentic AI report, 76% of people around the world are worried about vendor lock-in. One way to move forward is Datadog’s MCP Server approach, which lets agents access observability data in a controlled way while still following existing security rules. Cisco AI Defense works with Splunk to find risks like data leaks and prompt injection.

The Vendor Lock-In Problem

According to Docker, the main reasons for using both cloud and local models are control (64%), data privacy (60%), and compliance (54%). Cost ranks lower (41%). To cut down on reliance, teams are spreading their work across several models and clouds. Choose platforms that don’t rely on the cloud, such as xpander. ai, and open standards, such as OpenTelemetry.

The Future of AgentOps

Standardization around MCP (Model Context Protocol) is picking up speed. According to Docker’s research, 85% of teams know about MCP. Companies like Datadog and New Relic are making MCP servers to make it easier for everyone to access data.

An increasing number of people are learning about edge intelligence. According to a survey by ZEDEDA, 47% of businesses use hybrid cloud-edge architectures. Platforms from ZEDEDA and Nutanix are bringing agent management to the edge.

Agent-to-agent communication is becoming common. New Relic’s Agents Service Map addresses this. AgentOps in the FutureDocker says that 33% of companies have trouble with orchestration as multi-model environments become more common.

Governance will be required. Gartner says that by 2028, 40% of Fortune 1000 companies will no longer be able to control AI agents that act outside of the rules.

Framework consolidation is underway. Microsoft’s consolidation of AutoGen and Semantic Kernel into the Microsoft Agent Framework points to a trend of fewer, more stable enterprise frameworks.

About the Author:

Dmitry Baraishuk is a Partner and Chief Innovation Officer at Belitsoft. Belitsoft is a software engineering company specializing in DevOps, AI integration, and enterprise application modernization. The company serves clients across healthcare, fintech, and enterprise SaaS in the US, UK, and Canada. Belitsoft publishes technology trend analyses to help business and technology leaders make informed decisions about their software investment strategy.

Author

Balla

I am Erika Balla, a technology journalist and content specialist with over 5 years of experience covering advancements in AI, software development, and digital innovation. With a foundation in graphic design and a strong focus on research-driven writing, I create accurate, accessible, and engaging articles that break down complex technical concepts and highlight their real-world impact.

View all posts

Balla 2 seconds ago

11 minutes read