Why your stalled AI pilots are the most valuable thing in your transformation program

While Dario Amodei makes the case for impending AGI, the CTOs I speak with are blocked by rather more prosaic concerns: integration friction, security boundaries, reliability expectations, outdated authorization models, maintainability and governance. These issues will not be automatically solved by AI getting smarter. In practice, the opposite is true: probabilistic systems amplify every weakness in the infrastructure they touch.

Most enterprise leaders I meet have made significant investments in AI transformation. They’re fluent in MCP, A2A, multi-agent systems, the future we’ve all been reading about. But the reality of what has been deployed often lags far behind the hype.

Enterprise generative AI is already a $37 billion market, yet MIT research shows 95% of AI pilots never reach production. Most executives treat stalled pilots as embarrassing setbacks, but I think they’re diagnostic gold mines.

Pilot Purgatory Is an Infrastructure Problem

Sometimes I’m surprised by what people call “AI agents.” A chatbot is not an agent. A Python script that makes an LLM API call is not an agent. A Zapier automation with an LLM step is not an agent. If we’re going to have a new term, it had better mean something new. An agent has autonomy over its control flow and can use tools to interact with its environment. Short version: a chatbot is an LLM with a UI, a copilot is a chatbot with tools, and an agent is a copilot that doesn’t have or need a user.

If your “agent” doesn’t have these properties, you have automation with good PR. And while enterprises feel intense pressure to deploy AI, their vendors are slapping “agentic” on everything. This creates an illusion of progress that obscures a gathering storm of technical and governance debt.

When we filter for real agents (systems that actually have autonomy and tool use), the failures become instructive. There’s a pattern: they work in isolation, then hit integration walls. The AI capability is not the problem. The infrastructure is the problem. Pilots fail because the software they need to interact with was designed for humans, not machines. Every stalled pilot is telling you exactly where your legacy architecture has gaps, if you’re willing to listen.

The Critical Failure Points

My team analysed over 1,500 well-known APIs and scored them for AI-readiness. The results were striking. We evaluated APIs across multiple dimensions that determine whether an AI agent can reliably use them, and found systemic gaps across the board:

1. Documentation quality. Human developers bring institutional knowledge, support from colleagues, and time to experiment when working with an API. Agents have none of these. They rely solely on what is written in the API documentation. In a production setting, an agent needs to get it right first time (experimentation is slow, unreliable, and risks corrupting data). Documentation that was good enough for a human developer who could work around ambiguities becomes a hard blocker for an agent.

2. Structural consistency. Inconsistent naming conventions, unpredictable response formats and non-standard patterns across an API undermine the “principle of least surprise” that allows models to make correct inferences about expected behaviour. When a model has learned that one part of an API works a certain way, it will reasonably assume the rest follows suit. Inconsistency breaks that assumption, leading to silent failures.

3. Security and authentication information. API descriptions frequently omit or incompletely describe their authentication and security requirements. This information often lives outside the API specification entirely, buried in a website, a PDF, or a wiki page. If it’s not in the machine-readable specification, an agent can’t use it. This is one of the most common and most consequential gaps we found.

4. Error semantics. When an API returns a generic 500 error, a human developer can investigate. An agent cannot. Clear, machine-readable error responses with actionable detail are essential for agents to self-correct or fail gracefully. Most APIs don’t provide them.

5. Parameter and response complexity. Deeply nested parameter schemas, polymorphic responses, and undocumented optional fields create a combinatorial problem that overwhelms agent reasoning. APIs designed for developer convenience often create agent confusion. Add missing idempotency guarantees and unclear safety boundaries, and agents cannot safely retry failed operations or reason about side effects.

6. API organisation. Large APIs that present hundreds of endpoints as a flat list are difficult for agents to navigate. Logical grouping of operations (whether through tags, separate specifications, or clear resource hierarchies) is essential for agents to understand scope and find relevant capabilities. Without it, agents waste tokens and reasoning cycles on irrelevant operations, or miss relevant ones entirely.

These are some of the specific, measurable reasons pilots are stalling. And crucially, each one is fixable, once you know where to look.

Bad Workflows Make It Worse

Meanwhile, in the rush to show progress, bad workflows are proliferating. A typical bad workflow is painstakingly developed by hand and embeds hidden Python or JavaScript code inside nodes. These are trojan horses full of technical debt: invisible, fragile, unmaintainable, and untested. They exist outside proper software engineering practices (no code review, no tests, no static analysis, no IDE support).

Visual workflow tools can be excellent. But there’s a predictable failure mode: the “Diagram-as-Disguise” problem. A flowchart editor can look clean even when the underlying implementation is a disaster. Logic migrates into embedded scripts. Nodes become mini-programs. Error handling becomes ad hoc. At that point, you have all the downsides of software development with none of the advantages.

A good workflow should restrain itself to orchestration logic. This is why the Arazzo specification is deliberately not Turing-complete. There is a clear benefit to abstracting your workflow orchestration into a constrained language: it becomes easier for AI to generate, understand, analyse, modify and validate. Non-orchestration logic (data analysis, complex transformations, business rules evaluation) belongs in the platform layer, exposed as callable services. Dropping into embedded code unnecessarily in a workflow creates a worst-of-both-worlds situation: you’ve replaced managed software with unmanaged software masquerading as orchestration. You lose the flexibility of high-level workflows and you accumulate serious technical debt untempered by good software engineering management.

Most workflow tools feel manageable at 5 workflows, maybe 20, maybe 100. But enterprise automation at AI scale implies thousands of use cases, constant change pressure, and regulatory drift. Without standard formats, reusable building blocks, governance, discoverability, change control, and testing environments, your workflow estate becomes one of your largest sources of new technical debt, holding back progress instead of enabling it, all under the banner of AI transformation.

From Diagnosis to Strategy

So how do you mine your failed pilots for actionable intelligence? Start by recognising what they’re telling you.

Every stalled pilot reveals specific infrastructure gaps: which APIs lack the structured information agents need, which security descriptions are missing, which workflows can’t be composed because there’s no machine-readable description of how systems interact. This is diagnostic gold.

AI is great at the design stage: ideation, information synthesis, planning, code generation and relentless iteration. This suggests a more mature posture: use AI to generate workflows, test, refine and maintain them in pre-production, then deploy validated deterministic artifacts in production. What can be deterministic should be deterministic.

A critical enabler is a realistic API sandbox (a safely-isolated digital twin of the production environment, where agents can iterate until they solve each goal). The agentic sandbox is more than a once-off design tool; it’s the core driver of AI transformation where AI generates workflow automation at a scale that would be humanly infeasible, while also continuously maintaining and extending the workflow catalog as business requirements and regulations evolve. That’s AI scale without AI chaos.

This dichotomy (maximizing AI at design time and minimizing it in production) delivers the best of both worlds. We don’t think software is going to be replaced by LLMs. But the workflows that LLMs build for us will replace a lot of software.

Sovereignty Matters

Here’s a question CTOs should be asking: who owns your business logic?

Workflows encode claims decisions, pricing logic, escalation rules, risk thresholds, booking logic, route optimization, and compliance checks. Your workflow catalog is effectively an executable representation of your company. This is core intellectual property.

Storing your business logic in proprietary formats creates strategic vulnerability. Your fate becomes tied to a technology vendor that might disappear, change course, or have misaligned goals. You may not be able to easily audit, govern, improve or export the workflows. And the vendor has strong motivation to train on your workflows so that their platform gains the capabilities that previously made you unique.

If you think of a business as a bank account and a bunch of business logic, then a company that moves all its business logic to a proprietary workflow orchestrator is in danger of reducing itself to a commoditised wrapper. The answer is to turn to open-source and open-standard solutions, which is why Arazzo (the broadly industry-backed open standard workflow specification from the OpenAPI Initiative) is so timely. It’s not just about interoperability; it’s about maintaining autonomy and control over the business logic IP that differentiates you.

The Path Forward

If your organisation claims “AI transformation” but the implementation looks like consultants building workflows with embedded scripts inside proprietary formats across disconnected tools, you’re not on the right path. You’re heading towards the worst of both worlds, accruing silent technical debt that will be crippling when scaled.

But for those stuck in pilot purgatory, here’s the reframe: you’re already identifying barriers that others haven’t yet discovered. Your failed pilots are a map of exactly what needs to be fixed. The organisations that learn to read that map (that treat infrastructure diagnosis as the first step rather than a detour) will be the ones that cross the production divide.

The promise of AI is to automate what couldn’t be automated before, at a scale previously impossible. To get there: your unique business processes are your core IP; workflows are an executable version of your business logic; maximize LLMs at design time, minimize them in production; and maintain sovereignty by keeping ownership of your workflows through open standards.

The enterprises that win won’t be the ones with the flashiest demos. They’ll be the ones that did the infrastructure work.

Author

AIJ Thought Leader

View all posts

AIJ Thought Leader 50 minutes ago

7 minutes read

The Emperor’s New Agents

By Sean Blanchfield is CEO and co-founder of Jentic and a member of Ireland's AI Advisory Council.

Pilot Purgatory Is an Infrastructure Problem

The Critical Failure Points

Bad Workflows Make It Worse

From Diagnosis to Strategy

Sovereignty Matters

The Path Forward

Author

Pilot Purgatory Is an Infrastructure Problem

The Critical Failure Points

Bad Workflows Make It Worse

From Diagnosis to Strategy

Sovereignty Matters

The Path Forward

Author

Related Articles

Create AI Visuals Fast with Free Tools

The 2026 Inflection: Why the Era of “Magic” is Over and the Era of Infrastructure Has Begun

Full Stack JavaScript Development in 2026: What’s Actually Working

AI-Driven Website Engagement for Dealers