CEO of Global App Testing – enterprise technology evaluation company that uses human-centric rigour with over 120,000 people from more than 190 countries

In aviation, getting planes into the air was only the beginning. The real breakthrough came when the industry built systems that made flying safe. Testing, regulation and safety standards convinced people to trust the technology with their lives.

Artificial intelligence is approaching a similar moment.

At Mobile World Congress this year, almost everything was powered by AI – from smartphones to robotics and connected devices. Yet despite the excitement, adoption remains relatively low. Estimates suggest around 16% of people globally use generative AI tools now, but that number is expected to grow rapidly. Forecasts show the Gen-AI market will grow by over 30% annually through the end of the decade, from roughly $39 billion today to over $300 billion by 2033.

In other words, the technology is still early, but the adoption curve is steep. That gap matters because the stakes are rising quickly. In the future, people will rely on AI not just for writing emails or summarising documents, but for far more critical functions, from healthcare decisions to infrastructure management and even systems that support aviation safety.

That level of reliance depends on one thing above all. Trust.

Behind the excitement, a challenge is emerging. As organisations rush to embed AI into roadmaps and product strategies, many are starting with the technology rather than the problem they should be fixing. Product teams are being pushed to add AI before they have a clear view of how it will genuinely improve the user experience.

The result is an AI strategy trap. Impressive demonstrations that struggle to deliver meaningful value once products reach real users.

When AI meets real people

The gap between AI hype and reality becomes clear the moment systems move from controlled environments into the hands of real users.

Inside development environments, AI models are typically tested against structured prompts and predictable inputs. But once deployed at scale, those systems begin interacting with millions of people who communicate in very different ways.

Language alone introduces enormous complexity. Slang, humour, sarcasm, cultural nuance and regional dialects all shape how users interact with AI, and how AI responds in turn.

Most AI systems today are trained on a small fraction of the world’s linguistic diversity. Research suggests that AI language technologies effectively support just 2–3% of the world’s languages, yet many systems are deployed globally, already interacting with users far beyond the environments they were trained for. These mismatches quickly affect user confidence. Globally, around 1 in 2 people globally say they trust AI systems, highlighting how fragile that trust is.

The risk is rarely a dramatic failure. More often, it’s something slower and harder to detect, small inconsistencies that gradually erode trust. When responses feel unnatural, irrelevant or out of context, users begin to lose confidence in the system. This can impact everything from ethical questions like bias through to core commercial metrics like user engagement or perceived agent success.

At global scale, those moments multiply quickly.

The final 1% will decide who wins

As AI development matures, the hardest problems are no longer about building powerful models. Increasingly, they are about ensuring those systems behave reliably and safely once they reach real users.

This is the final 1% of AI development. The stage where systems must be refined, evaluated and validated for real-world use.

In theory, an AI model may perform impressively in testing environments. In practice, that same model must interact with millions of users across different languages, cultures and contexts. Small differences in phrasing, tone or expectations can change how users experience the system.

This is where a new evaluation layer is beginning to emerge across the industry – focused not just on model performance, but on how AI behaves when real people interact with it across different markets and contexts.

The companies that succeed in the next phase of AI adoption will be those that invest in this evaluation layer. Organisations racing to deploy AI capabilities without this final stage of validation risk launching systems that work technically, but fail to build trust with users. As AI adoption accelerates, trust will ultimately determine which products people continue to rely on.

Just as aviation only became viable once the industry built systems to ensure safety, AI’s next phase will depend on how reliably intelligent systems behave once they reach the real world.

The companies that succeed will be the ones that treat real-world evaluation not as an afterthought, but as a core part of building trustworthy AI.

Author

Balla

I am Erika Balla, a technology journalist and content specialist with over 5 years of experience covering advancements in AI, software development, and digital innovation. With a foundation in graphic design and a strong focus on research-driven writing, I create accurate, accessible, and engaging articles that break down complex technical concepts and highlight their real-world impact.

View all posts

Balla 7 hours ago

3 minutes read

Author

Related Articles

The UK is “open for business”; if that business happens to be AI

As EU AI Act deadlines approach, it’s time for UK firms to get governance in order

The UK’s AI skills push needs more than free courses to succeed

How companies like Bitlux are using AI to quietly reshape private aviation