AI & Technology

The Emperor’s New Agents

By Sean Blanchfield is CEO and co-founder of Jentic and a member of Ireland's AI Advisory Council.

Why your stalled AI pilots are the most valuable thing in your transformation program

While Dario Amodei makes the case for impending AGI, the CTOs I speak with are blocked by rather more prosaic concerns: integration friction, security boundaries, reliability expectations, outdated authorization models, maintainability and governance. These issues will not be automatically solved by AI getting smarter. In practice, the opposite is true: probabilistic systems amplify every weakness in the infrastructure they touch.ย 

Most enterprise leaders I meet have made significant investments in AI transformation.ย They’reย fluent in MCP, A2A, multi-agent systems, the futureย we’veย all been reading about. But the reality of what has been deployed often lags far behind the hype.ย 

Enterprise generative AI is already aย $37 billionย market, yetย MIT researchย shows 95% of AI pilots never reach production. Most executives treat stalled pilots as embarrassing setbacks, but I thinkย they’reย diagnosticย gold mines.ย 

Pilot Purgatory Is an Infrastructure Problemย 

Sometimes I’m surprised by what people call “AI agents.” A chatbot is not an agent. A Python script that makes an LLM API call is not an agent. A Zapier automation with an LLM step is not an agent. If we’re going to have a new term, it had better mean something new. An agent has autonomy over its control flow and can use tools to interact with its environment. Short version: a chatbot is an LLM with a UI, a copilot is a chatbot with tools, and an agent is a copilot that doesn’t have or need a user.ย 

If your “agent”ย doesn’tย have these properties, you have automation with good PR. And while enterprises feel intense pressure to deploy AI, their vendors are slapping “agentic” on everything. This creates an illusion of progress that obscures a gathering storm of technical and governance debt.ย ย 

When we filter for real agents (systems that actually have autonomy and tool use), the failures become instructive. There’s a pattern: they work in isolation, then hit integration walls. The AI capability is not the problem. The infrastructure is the problem. Pilots fail because the software they need to interact with was designed for humans, not machines. Every stalled pilot is telling you exactly where your legacy architecture has gaps, if you’re willing to listen.ย 

The Critical Failure Pointsย 

My teamย analysedย over 1,500 well-known APIs and scored them for AI-readiness. The results were striking. We evaluated APIs across multiple dimensions thatย determineย whether an AI agent can reliably use them, and found systemic gaps across the board:ย 

1. Documentation quality.ย Human developers bring institutional knowledge, support from colleagues, and time to experiment when working with an API. Agents have none of these. They rely solely on what is written in the API documentation. In a production setting, an agent needs to get it right first time (experimentation is slow, unreliable, and risks corrupting data). Documentation that was good enough for a human developer who could work around ambiguities becomes a hard blocker for an agent.ย 

2. Structural consistency.ย Inconsistent naming conventions, unpredictable responseย formatsย and non-standard patterns across an API undermine the “principle of least surprise” that allows models to make correct inferences about expectedย behaviour. When a model has learned that one part of an API works a certain way, it willย reasonably assumeย the rest follows suit. Inconsistency breaks that assumption, leading to silent failures.ย 

3. Security and authentication information.ย API descriptionsย frequentlyย omit or incompletely describe their authentication and security requirements. This information often lives outside the API specification entirely, buried in a website, a PDF, or a wiki page. Ifย it’sย not in the machine-readable specification, an agentย can’tย use it. This is one of the most common and most consequential gaps we found.ย 

4. Error semantics.ย When an API returns a generic 500 error, a human developer can investigate. An agent cannot. Clear, machine-readable error responses with actionable detail are essential for agents to self-correct or fail gracefully. Most APIsย don’tย provide them.

5. Parameter and response complexity.ย Deeply nested parameter schemas, polymorphic responses, and undocumented optional fields create a combinatorial problem that overwhelms agent reasoning. APIs designed for developer convenience often create agent confusion. Add missing idempotency guarantees and unclear safety boundaries, and agents cannot safely retry failed operations or reasonย aboutย side effects.ย 

6. API organisation.ย Large APIs that present hundreds of endpoints as a flat list are difficult for agents to navigate. Logical grouping of operations (whether through tags, separate specifications, or clear resource hierarchies) is essential for agents to understand scope and find relevant capabilities. Without it, agents waste tokens and reasoning cycles on irrelevantย operations, orย miss relevant ones entirely.

These are some of the specific, measurable reasons pilots are stalling. And crucially, each one isย fixable, onceย you know where to look.ย 

Bad Workflows Make It Worseย 

Meanwhile, in the rush to show progress, bad workflows are proliferating. A typical bad workflow is painstakingly developed by hand and embeds hidden Python or JavaScript code inside nodes. These are trojan horses full of technical debt: invisible, fragile, unmaintainable, and untested. They exist outside proper software engineering practices (no code review, no tests, no static analysis, no IDE support).ย ย 

Visual workflow tools can be excellent. Butย there’sย a predictable failure mode: the “Diagram-as-Disguise” problem. A flowchart editor can look clean even when the underlying implementation is a disaster. Logic migrates into embedded scripts. Nodes becomeย mini-programs. Error handling becomesย adย hoc. At that point, you have all the downsides of software development with none of the advantages.ย 

A good workflow should restrain itself to orchestration logic. This is why theย Arazzoย specification is deliberately not Turing-complete. There is a clear benefit to abstracting yourย workflowย orchestration into a constrained language: it becomes easier for AI to generate, understand,ย analyse,ย modifyย and validate. Non-orchestration logic (data analysis, complex transformations, business rules evaluation) belongs in the platform layer, exposed as callable services. Dropping into embedded code unnecessarily inย a workflow creates a worst-of-both-worldsย situation:ย youโ€™veย replaced managed software with unmanaged software masquerading as orchestration. You lose the flexibility of high-levelย workflowsย and you accumulate serious technical debtย untemperedย by good software engineering management.ย 

Most workflow tools feel manageable at 5 workflows,ย maybe 20,ย maybe 100. But enterprise automation at AI scale implies thousands of use cases, constant change pressure, and regulatory drift. Without standard formats, reusable building blocks, governance, discoverability, change control, and testing environments, your workflow estate becomes one of your largest sources of new technical debt, holding back progress instead of enabling it, all under the banner of AI transformation.ย 

From Diagnosis to Strategyย 

So how do you mine your failed pilots for actionable intelligence? Start byย recognisingย whatย they’reย telling you.ย 

Every stalled pilot reveals specific infrastructure gaps: which APIs lack the structured information agents need, which security descriptions are missing, which workflowsย can’tย be composed becauseย there’sย no machine-readable description of how systems interact. This isย diagnosticย gold.ย ย 

AI is great at the design stage: ideation, information synthesis, planning, code generation and relentless iteration. This suggests a more mature posture: use AI to generate workflows, test, refine andย maintainย them in pre-production, then deploy validated deterministic artifacts in production. What can be deterministic should be deterministic.ย 

A critical enabler is a realistic API sandbox (a safely-isolated digital twin of the production environment, where agents can iterate until they solve each goal). The agentic sandbox is more than a once-off design tool; it’s the core driver of AI transformation where AI generates workflow automation at a scale that would be humanly infeasible, while also continuously maintaining and extending the workflow catalog as business requirements and regulations evolve. That’s AI scale without AI chaos.ย 

This dichotomy (maximizing AI at design time and minimizing it in production) delivers the best of both worlds. We don’t think software is going to be replaced by LLMs. But the workflows that LLMs build for us will replace a lot of software.ย 

Sovereignty Mattersย 

Here’sย a question CTOs should be asking: who owns your business logic?ย 

Workflows encode claims decisions, pricing logic, escalation rules, risk thresholds, booking logic, route optimization, and compliance checks. Your workflow catalog is effectively an executable representation of your company. This is core intellectual property.ย ย 

Storing your business logic in proprietary formats creates strategic vulnerability. Your fate becomes tied to a technology vendor that might disappear, change course, or have misaligned goals. You may not be able to easily audit, govern, improveย or export the workflows. And the vendor has strong motivation to train on your workflows so that their platform gains the capabilities that previously made you unique.ย ย 

If you think of a business as a bank account and a bunch of business logic, then a company that moves all its business logic to a proprietary workflow orchestrator is in danger of reducing itself to aย commoditisedย wrapper. The answer is to turn to open-source and open-standard solutions, which is whyย Arazzoย (the broadly industry-backed open standard workflow specification from theย OpenAPI Initiative) is soย timely.ย It’sย not just about interoperability;ย it’sย aboutย maintainingย autonomy and control over the business logic IP that differentiates you.ย 

The Path Forwardย 

If yourย organisationย claims “AI transformation” but the implementation looks like consultants building workflows with embedded scripts inside proprietary formats across disconnected tools,ย you’reย not on the right path.ย You’reย heading towards the worst of both worlds,ย accruingย silent technical debt that will be crippling when scaled.ย 

But for those stuck in pilot purgatory,ย here’sย theย reframe:ย you’reย alreadyย identifyingย barriers that othersย haven’tย yet discovered. Your failed pilots are a map of exactly what needs to be fixed. Theย organisationsย that learn to read that map (that treat infrastructure diagnosis as the first step rather than a detour) will be the ones that cross the production divide.ย 

The promise of AI is to automate whatย couldn’tย be automated before, at a scale previously impossible. To get there: your unique business processes are your core IP; workflows are an executable version of your business logic; maximize LLMs at design time, minimize them in production; andย maintainย sovereignty by keeping ownership of your workflows through open standards.ย 

The enterprises that winย won’tย be the ones with the flashiest demos.ย They’llย be the ones that did the infrastructure work.ย 

Author

Related Articles

Back to top button