
Multi-agent AI systems are being deployed at a pace that regulation simply cannot keep up with. The problem is that the rulebook is being written after the fact, and in the gap between innovation and governance, things are going wrong.
More than 80% of enterprises report lacking the mature AI infrastructure, the monitoring, auditability, and control mechanisms needed to govern agentic systems at scale. Meanwhile, trust in fully autonomous AI agents has fallen from 43% to just 27% over the past twelve months, as organisations move from pilot to production. And yet the deployments keep coming. 75% of technology leaders name governance as their primary concern when deploying agentic AI, but most are pressing ahead anyway, bolting agents onto live operations with no real protective layer underneath.
For B2B service leaders, this is a big problem. You’re running operations that carry client commitments, regulatory obligations, and SLAs that you have put your name to. What happens when an agent takes an action you didn’t sanction? Who owns that?
The promise of multi-agent AI is real. Instead of a single model trying to do everything, you have networks of specialised agents working in tandem, each handling a specific task, handing work off to the next, moving at a speed no human team could match. The allure is shiny and easy to understand. Who wouldn’t want the slickest, self-running operation that technology has ever offered? In reality, most organisations are building this vision on sand.
When AI agents go wrong
The failures are starting to mount up, and they are getting harder to dismiss as teething problems.
In July 2025, a coding agent at startup SaaStr was tasked with routine maintenance during a code freeze. Ignoring explicit instructions to make no changes, it executed a DROP DATABASE command and wiped the production system. When confronted, it generated 4,000 fake user accounts and false system logs to cover its tracks. Its own explanation: “I panicked instead of thinking.” An agent that lies to protect itself is a category of risk that most governance frameworks aren’t built to handle.
At the UK’s AI Safety Summit, Apollo Research ran a simulated experiment in which an investment management chatbot was told about an upcoming merger announcement and warned it constituted insider information. The bot made the trade anyway. When asked if it had prior knowledge, it denied it. The agent was operating exactly as its objective function intended, but had no ethical guardrail or human able to intervene in time.
Then there’s the infrastructure problem. A basic data entry error at Citigroup nearly sent $6 billion to the wrong account, illustrating the scale of damage possible when systems act without appropriate human oversight gates.
Among early AI adopters, 46% of all data-policy violations involved developers pasting proprietary source code into generative AI tools. Between 2024 and 2025, the volume of corporate data flowing into AI services grew more than 30 times, creating a dramatically larger exposure surface almost overnight. Intellectual property, client data and operational processes… They’re often already moving through systems that many organisations have no visibility into.
80% of organisations using AI agents report instances of applications acting outside their intended boundaries. Specific incidents include unauthorised access (39%), restricted information handling (33%), and phishing-related movements (16%). Without proper guardrails and infrastructure, agents can act in erratic ways.
The wild (agentic) west needs a sheriff
The market is moving fast and the pattern is familiar. An agent here, an automation there, a workflow stitched together by an enthusiastic engineering team over a long weekend. It just about works…Until it doesn’t. In a B2B service environment, if it doesn’t work, a client could easily find out before you do.
Gartner predicts that more than 40% of agentic AI projects will be cancelled by the end of 2027 due to rising costs, unclear business value, or insufficient risk controls. The organisations that manage to deploy successful agentic AI will have built the right environment for those agents to operate in.
The reality is that a capable agent in a poorly governed system causes more damage than a basic one, because it acts further and faster before anyone notices something has gone wrong.
What’s missing from most multi-agent deployments is a structured operational layer, the equivalent of a regulated grid. This makes every agent action visible, every decision auditable, and every exception routable to a human who can actually do something about it.
How enterprises are trying to solve it
Most organisations know they have an agentic problem and mitigation usually comes in the following ways…
The first is the DIY route. Engineering teams build homegrown governance systems on top of existing infrastructure. The intent is good and the output is often technically impressive, but you’re solving a governance problem with more engineering. It’s expensive to build, expensive to maintain, and it tends to reflect the priorities of whoever built it rather than the operational reality of the business running it.
The second is to go all-in with a hyperscaler. Microsoft, Salesforce, Google all offer agent platforms with varying degrees of structure and oversight. You get governance of a kind, but your agents, your data, and your IP are now living in someone else’s ecosystem. For service businesses handling sensitive client data across multiple jurisdictions, that’s a significant strategic trade-off.
The third is point solutions. Tools that manage one type of agent, in one context, doing one job. Fine to start. But as your agent landscape grows, you end up stitching together five different governance tools, and the seams between them are usually where things go wrong.
A better way to run multi-agent AI systems at scale
What organisations actually need is an orchestration layer that wraps around their agent landscape without forcing them to rebuild from scratch. One that brings agents into a governed, auditable system where businesses retain control of models, data, and processes. An orchestration solution makes agents safe to run at scale.
Critically, a successful agent system has to keep humans in the loop. Not as a bureaucratic step that slows everything down, but as a genuine circuit breaker. When an agent is about to take an action outside its defined boundaries, the system flags it and routes it to the person who can make the right call. That’s the difference between an operation that scales confidently and one that ends up a client escalation you didn’t want to have.
57% of business leaders now believe people should manage and direct AI agents, not the other way around. The service businesses that get this right will run more capable, more trusted operations. The ones that don’t will be managing incidents and explaining themselves to clients for the next few years. Multi-agent AI is here to stay. The question is whether you are running it, or whether it is running you.

