
In 2023, two New York lawyers used ChatGPT to help draft a legal brief, only to discover that the model had fabricated entire court cases. The judge in Mata v. Avianca Airlines wasn’t amused, and the lawyers were sanctioned.
The AI didn’t warn them. They didn’t double-check. It was a cautionary tale for every executive fantasizing about replacing high-stakes decisions with machine intelligence. If you can’t trust the AI to know when it’s wrong, you definitely can’t trust it to be correct.
Right now, AI is everywhere, except where it matters most.
From marketing copy to meeting notes, large language models have proven they can handle the tedious, the repetitive, and the safe. But give them a task with real-world stakes, deciding a loan, diagnosing a patient, approving a financial transaction, and suddenly, the enthusiasm dries up. Enterprise leaders love talking about AI, but few are ready to actually trust it.
This isn’t a tech problem. It’s a trust problem.
And until that changes, AI will remain a glorified intern.
And yet, we’re flooded with headlines about how AI will revolutionize everything. Boards are asking CTOs for AI strategy decks. LinkedIn is a sea of “AI is the future” thought-leadership posts.
But in actual enterprise environments, AI is mostly doing one thing: low-stakes work. And that’s not going to change until one thing does: trust.
This article examines why companies are reluctant to hand over mission-critical work to AI, exploring the psychological, operational, and technical barriers keeping AI on a short leash. We also discuss how improving explainability, reliability, and failure handling could close this trust gap and unlock broader adoption.
The AI Job Ladder, Visualized
Here’s where things stand today:
Intern AI | Executive AI |
Writes meeting summaries | Writes quarterly financial reports |
Answers FAQs via chatbot | Handles crisis PR responses |
Suggests marketing copy | Launches and optimizes ad campaigns |
Flags suspicious transactions | Approves wire transfers |
Highlights resumes | Decides who gets hired |
The left column is real. The right? Still a fantasy.
Because until AI can explain itself, prove it’s consistent, and fail gracefully, no one’s giving it mission-critical responsibilities.
Common Current Enterprise AI Use Cases
In 2024, roughly 90% of enterprises report using AI in some capacity. However, these uses are mostly for low-risk, assistive tasks rather than core decision-making. Common AI use cases in enterprises today include:
- Text Summarization & Drafting: AI tools generate meeting notes, summarize long documents, or draft emails and reports for human review.
- Scheduling & Coordination: Assistants help schedule meetings, manage calendars, and handle routine coordination.
- Classification & Data Entry: AI systems tag incoming emails or support tickets, categorize expenses, or update records, reducing manual data work.
- Customer Service Chatbots: Many companies deploy chatbots to answer frequently asked questions or guide users through basic support queries.
- Analytics & Detection: AI helps flag anomalies (e.g. unusual transactions), forecast trends, or extract insights from big data to support human analysts.
These applications improve efficiency and are relatively low-stakes. If the AI makes an error, such as a slightly off summary or a wrong email tag, the consequences are minor or easily caught by a human. Indeed, enterprises primarily leverage AI to enhance customer experiences, streamline operations, and speed up analytics. For example, a recent survey found AI is “commonly used for enhancing customer experiences (60%), improving operational efficiency (57%) and speeding up analytics processes (51%)” . Such tasks resemble an “AI intern” – handling grunt work under supervision.
Stuck in Low-Stakes Mode: Barriers To High-Stakes AI
Given these concerns, enterprises largely quarantine AI in low-stakes roles. An AI writing an email draft is OK; an AI making a $10 million decision is not. Below is a comparison of typical low-risk AI use cases versus potential mission-critical counterparts, with the barriers that keep AI from advancing:
Low‑Stakes AI Use Case (Assistive Role) | Mission‑Critical Use Case (Autonomous Role) | What’s Blocking Mission‑Critical AI |
Summarizing documents or meeting notes – LLMs draft recaps for human review. | Generating final contracts or financial reports – AI fully authors important documents. | Accuracy & Accountability: Minor summary errors are tolerable, but in legal/financial docs mistakes have serious consequences. Lack of auditability and explainability makes leaders insist a human finalize mission-critical documents. |
Scheduling meetings and email triage – Assistants find open slots or sort low-priority emails. | Strategic decision-making – AI autonomously approves budgets, investments, or hiring decisions. | Judgment & Trust: High-level decisions require contextual understanding, nuance, and justification. AI’s logic is opaque, so executives won’t trust it solo on decisions where judgment and accountability are key (e.g. hiring, where AI bias is a huge worry). |
Customer service chatbot for FAQs – Answers routine questions with scripted info. | Handling critical customer issues – AI manages angry escalations or crisis responses without human help. | Brand Risk & Empathy: In high-stress cases, a wrong or tone-deaf answer can lose a customer or spark backlash. AI lacks true empathy and can go off-script (hallucinate). Companies fear an unreliable AI could turn a PR issue into a disaster. |
Marketing content suggestions – AI suggests copy or images for campaigns, human marketers approve. | Autonomous campaign execution – AI runs an entire marketing campaign (targeting, copy, spend) on its own. | Control & Unpredictability: Creative decisions involve brand voice and risk management (avoiding offensive or off-brand content). Without guarantees on AI’s output quality and compliance, firms keep humans at the helm for final approval. |
Predictive maintenance alerts – AI flags equipment likely to need repair for engineers to check. | Autonomous industrial control – AI directly controls plant equipment or an entire power grid to optimize performance. | Safety & Reliability: Tolerating some false alerts is fine, but giving AI full control could cause accidents if it fails. Such systems need rigorous validation, fail-safes, and regulatory approval. Most companies do not yet trust AI to gracefully handle failures in these environments. |
Resume screening assistance – AI highlights promising candidates for HR. | Hiring and promotion decisions – AI decides who to hire or promote without human input. | Fairness & Transparency: Hiring is sensitive – biased decisions can violate laws and spark lawsuits. AI recommendations here demand clear explanations. Lacking that, and given past bias examples, firms use AI only as a second opinion at best, never the final arbiter. |
In each of these comparisons, we see a common theme: AI is currently limited to advice and automation under human oversight, not final authority. The greater the impact of a decision (financial, legal, safety, or reputational), the higher the bar for trusting AI to handle it.
AI Is in the Building but Only in the Mailroom
Let’s start with the numbers. According to McKinsey, over 80% of companies now use AI in some form. But most of those use cases are extremely safe: text generation (63%), image creation (a third), and code snippets (around a quarter). These are jobs where failure is mildly annoying, not catastrophic.
Only 35% of organizations have managed to scale AI across multiple departments. And almost none have given it true decision-making power.
Why? Because when AI fails, it doesn’t stumble, it hallucinates.
Hallucinations Are Dealbreakers
One of the core problems with AI in high-stakes environments is that when it fails, it doesn’t fail like a human. It hallucinates, invents citations, and answers confidently when it shouldn’t.
That Avianca court case? It wasn’t an isolated incident. Large language models still struggle to distinguish fact from fiction, in high-stakes environments—finance, law, healthcare that’s not a quirk. That’s a full-stop reason not to deploy.
A recent survey by Writer.com found that 61% of companies have experienced accuracy issues with their generative AI tools, and only 17% rated their in-house models as “excellent.”
In a world where enterprises are expected to mitigate risk, you can’t build mission-critical systems on a foundation that sometimes invents its own reality. That’s not good enough when someone’s health, freedom, or livelihood is on the line.
Transparency and Explainability Are Still Missing
Even if the model works, no one fully trusts it yet.
Imagine an AI that flags a transaction as fraudulent. Can it tell you why?
For most enterprise-grade systems, the answer is still no. Which is a problem, because enterprise users need justification—not vibes.
This has nothing to do with processing power or dataset size. It has everything to do with explainability. If a model tells you to approve a loan, deny a claim, or prescribe a treatment—you need to know why.
And right now? You don’t.
The Workday Global Survey found that only 62% of leaders are confident in their ability to implement AI responsibly—and only 52% of employees agree. That’s a 10-point trust gap inside the same company.
Only 11% of organizations have fully implemented responsible AI governance. Most still lack policies, audit systems, or even basic training programs. Which means: no one’s exactly sure where the AI ends—and the liability begins.
The Most Useful Thing AI Can Say: “I Don’t Know”
And this is where it gets interesting.
Some of the smartest AI deployments today aren’t the ones making confident decisions—they’re the ones that know when to shut up.
At Morgan Stanley, a GPT-powered wealth assistant helps advisors answer client questions. But according to Tearsheet, the model is trained to say “I don’t know” when unsure. It’s better to skip an answer than give a wrong one.
This idea—graceful failure—is emerging as one of the most important design principles in enterprise AI. Trust isn’t built through perfect performance. It’s built through predictable limits.
We don’t trust pilots because they never make mistakes. We trust them because we know what happens when something goes wrong.
We need AI systems that work the same way.
The Trust Gap Won’t Close With Code Alone
Enterprises aren’t just waiting for better models. They’re waiting for better guardrails.
Responsible AI practices, audit trails, explainability, fallback mechanisms—these are the features that will matter more than token count or multimodality. And right now, they’re missing.
Meanwhile, only 21% of companies have AI policies in place, and few provide proper AI training. No wonder employees are wary.
Companies like Accenture and AWS are now pushing governance and transparency as the next wave of value creation. But implementation is slow. And for good reason: Trust is a culture, not a checklist.
Industry Perspectives on AI Trust
Industry leaders and analysts widely recognize the trust gap as the key hurdle in enterprise AI adoption. A Deloitte report on generative AI in 2024 found that issues like governance, training, and trust are among the biggest challenges to scaling AI, with the majority of organizations acknowledging they need at least another year or more to address these adoption barriers. In other words, companies realize that technology is only part of the equation – building trust infrastructure takes time.
Executives emphasize a “people-first” approach. Workday’s CIO, for example, advocates focusing on access and experimentation to gradually build confidence in AI, noting that small wins can demonstrate value and reliability, thereby expanding AI’s remit over time. Enterprise strategists often repeat the mantra that AI should be a copilot to employees. As one Microsoft leader put it, “It’s called Copilot, not Autopilot… The human in the loop is key, and you always go in and check to make sure things are accurate.”. This reflects a cautious optimism use AI to boost productivity, but keeps oversight until trust is earned.
Analysts also point out that rushing AI without proper controls can backfire. A survey of business leaders by Stibo Systems found 32% admitted they have rushed AI adoption, and 58% acknowledged a lack of AI ethics training in their organizations. Those figures suggest many organizations know they must slow down and put guardrails in place to avoid an eventual trust crisis. Earning end-users’ trust, whether employees or customers, is paramount. As the World Economic Forum noted in 2024, only about 30% of people globally embrace AI today, while 35% actively reject it, highlighting that public trust lags behind tech advancements. Enterprise leaders are thus treading carefully: they want to harness AI’s benefits but not outrun the trust of their workforce, customers, or regulators.
The Enterprise Trust Gap
As AI expands into more roles, many organizations are still hesitant to trust it with high-stakes decisions. A Workday survey found 93% of employees and leaders have concerns about formal AI implementation. Why the caution? It’s not the tech, it’s the trust.
1. The Black Box Problem
AI often can’t explain itself. Leaders hesitate to act on outputs they can’t understand especially in critical domains. In legal eDiscovery, for example, adoption has lagged because “lack of transparency is one of the reasons that technology-assisted review hasn’t caught on.”
If AI flags a transaction or suggests a treatment, the “why” matters. Without it, humans stay in the loop, just to be safe.
2. Unpredictable Failure Modes
Generative AI can “hallucinate”—confidently produce false answers without warning. These rare but bizarre edge cases erode trust.
As one expert put it:
“We simply don’t know what we don’t know about how AI can act sometimes.”
In regulated sectors, even low-risk errors are unacceptable.
3. Bias & Ethics
AI can reflect (or amplify) bias from its training data. Enterprises fear legal or PR fallout. Remember the COMPAS algorithm used in courts? It was found biased against certain groups.
In hiring, finance, and justice—where fairness is paramount—this is a dealbreaker.
4. Safety & Liability Risks
In healthcare, finance, or transport, AI mistakes can cause real harm. From prompt injections to erratic outputs, leaders worry AI might be manipulated—or just wrong.
74% of IT leaders cite security and compliance risks as major barriers to AI adoption. If an AI trader glitches or an autonomous vehicle fails, the stakes are high.
5. Loss of Control & Accountability
If AI makes a bad call—who’s liable? Many organizations prefer AI as a copilot, not an autopilot.
GitHub’s own Copilot Trust Center emphasizes that humans stay in charge:
“It’s not intended to generate suggestions without oversight.”
The message is clear: keep a human in the loop.
6. Cultural & Psychological Resistance
Beyond risk, there’s fear. AI is still new in most workplaces. People worry about job loss, decision-making power, or simply not understanding it.
As Workday’s Daniel Pell put it:
“The biggest barrier to AI adoption today is trust.”
That trust, he adds, must be earned through clear communication, cultural buy-in, and showing how AI helps—not replaces—people.
Closing the AI Trust Gap: What It’ll Take
1. Explainable AI (XAI)
Trust begins with understanding. If stakeholders get clear, human-readable reasons for AI decisions, they’re more likely to accept them. For instance, an AI loan model might reveal that income, credit history, and debt ratio were top decision factors.
Techniques like feature attribution, natural language explanations, and surrogate models help bridge this gap. Some industries are starting to mandate transparency tools like “model cards.” As McKinsey notes, “Building AI trust: The key role of explainability” source.
“Explainable AI enables humans to enter the decision-making process, increasing trust and accountability.” – IBM
By making AI’s reasoning visible, companies can align it with human expectations—and reduce fear of the unknown.
2. Rigorous Testing & Validation
Traditional software gets QA’d. AI should too. That means stress tests, edge-case audits, and historical simulations. Banks, for example, test AI against years of past decisions to catch blind spots.
Frameworks like the NIST AI Risk Management Framework are emerging to guide certification, bias reduction, and security checks.
“Earning trust in AI is much like earning trust in humans – it requires a track record.”
Transparency in testing can make regulators and executives more confident in AI systems.
3. Robust Model Design
AI needs to be less overconfident—and more aware of its limits. New approaches like adversarial training, ensemble models, and safe optimization help build that humility.
A standout technique: retrieval-augmented generation (RAG). Instead of guessing, RAG models cite trusted sources (e.g., “According to Document X, the answer is Y”), reducing hallucinations and grounding responses in real evidence.
Think of RAG as turning AI from a guesser into a junior analyst who cites their work.
4. Governance & Human-in-the-Loop
Enterprises are putting rules around AI—deciding what it can and cannot do alone. Human-in-the-loop setups allow for oversight, especially early on. Some firms run AI in parallel with humans to compare results, gradually increasing autonomy if the AI performs well.
“Trust but verify” isn’t just a cold-war relic—it’s how most companies are approaching AI today.
5. Cultural Change & Training
Trust isn’t only technical—it’s emotional. Companies that adopt AI successfully often train staff, clarify AI’s purpose, and highlight benefits like reduced grunt work.
They also appoint internal AI champions and set realistic expectations. Transparency around limitations is key.
“People trust tools they understand and control. Education turns fear into fluency.”
Real-World Examples Underscoring the Hesitation
It’s instructive to look at cases where AI was given a high-stakes role–or nearly so–and what went wrong. These examples illustrate why enterprises remain cautious:
- IBM Watson for Oncology – Trust on Trial: IBM’s Watson was once heralded as a revolution in cancer treatment, using AI to recommend personalized therapies. In practice, it became a “PR disaster”. Doctors found Watson’s suggestions either obvious (when they matched the doctor’s own judgment) or unconvincing (when they didn’t). Crucially, Watson often couldn’t explain its reasoning in a way that made sense to oncologists. As a result, “doctors simply didn’t trust it”. One partner hospital, MD Anderson Cancer Center, ultimately dropped the program after it failed to improve outcomes and often disagreed with experts. This high-profile failure showed that even one of the world’s most powerful AI systems faltered without physician trust and clear explainability. The lesson for enterprises: if your experts can’t understand or agree with the AI, they won’t use it, especially when lives are at stake.
- ChatGPT’s Legal Hallucinations – Reliability in Question: In a now-famous 2023 incident, two New York lawyers used ChatGPT to help write a legal brief – and ended up submitting fictitious case citations that the AI had invented. The court was not amused: the judge sanctioned the attorneys for a misleading filing. The lawyers explained that they “failed to believe that a piece of technology could be making up cases out of whole cloth”. This incident (Mata v. Avianca) became a cautionary tale: even educated professionals can be duped by AI’s confident falsehoods. For enterprises, it rang alarm bells about deploying AI in any unsupervised capacity. If a respected AI tool can hallucinate legal precedents, what might it do in other high-stakes tasks if left unchecked? The case underscored the need for rigorous verification of AI outputs and kept AI firmly in the “assistive draft” role rather than an authoritative one in legal work.
- Financial Services and “Spotty” AI Advice: In the financial industry, companies are testing AI cautiously. Morgan Stanley, for instance, deployed a GPT-4 based assistant to help financial advisors answer client questions by drawing on the firm’s knowledge base. Despite using a top-tier model, early users described the tool as “spotty on accuracy,” with the AI often confessing “I’m unable to answer your question” when unsure. In fact, this reluctance to answer may have been by design – a form of graceful failure to avoid hallucinations. Morgan Stanley fine-tuned the AI on vetted internal data and narrowed its scope to reduce errors. The rollout shows both the promise and the trepidation: wealth management is beginning to leverage AI for research summarization, but the AI is carefully constrained, and any hint of incorrect advice is met with quick adjustments. No bank will let an AI trade stocks or make client decisions autonomously until it’s proven rock-solid. Even with heavy investment in AI, trust is at the very heart of financial advice relationships, and that trust must be earned gradually.
- Autonomous Vehicles – Cautious Progress: Outside the boardroom, the autonomous driving sector provides a parallel example of tempered expectations. Self-driving car AIs have logged millions of miles, yet fully driverless deployments remain limited. Occasional accidents involving AI-driven cars garner massive attention and scrutiny. Engineers emphasize “graceful degradation” – ensuring a car safely hands control back to a human or comes to a safe stop if the AI encounters a scenario it can’t handle. The slow progress toward true “Level 5” autonomy highlights how high reliability requirements (driving must be nearly flawless) and public trust issues delay mission-critical AI. Many enterprises watch this space and draw lessons about introducing AI into safety-critical operations: the technology might be mostly ready, but society’s trust and regulatory green lights lag behind.
These examples reinforce a key point: when AI falters in a high-stakes setting, the setback isn’t just a one-off error – it can significantly set back trust and adoption. Enterprises, therefore, proceed very carefully, often using these stories to internally justify why AI isn’t ready for prime time in mission-critical roles.
Boring Is the New Breakthrough
The next frontier in AI isn’t more novelty—it’s more reliability.
That’s why AWS and Accenture are pushing responsible AI frameworks, and why PWC and others are investing in internal audit tools, interpretability research, and safe deployment protocols.
It’s also why more companies are turning to retrieval-augmented generation (RAG), human-in-the-loop workflows, and fallback systems.
We’ve had enough magic tricks. Now we need seatbelts.
The irony is that enterprises aren’t rejecting AI. They’re embracing it, just not recklessly. They’re using it to write emails, sort tickets, summarize reports. They’re experimenting. Piloting. Watching closely.
The gap between AI’s impressive capabilities and the willingness of organizations to use it in mission-critical tasks boils down to trust. Can the AI be understood? Will it do the right thing 99.999% of the time? And what happens the moment it fails? Until those questions have reassuring answers, enterprises will keep AI on a tight leash.
Closing this trust gap will require continued progress on explainability, reliability, and human-centric design. As AI systems become more transparent and proven through rigorous testing and organizations develop cultures and processes to integrate AI, that leash can gradually extend safely. We are likely to see AI move up from intern to “co-pilot,” and eventually, in specific, well-guarded scenarios – into the pilot’s seat for certain tasks. The journey to mission-critical AI is cautious by necessity: trust is earned, not given. Until then, it’s still the intern. Smart. Helpful. And nowhere near the C-suite.