AI model creators spent the past several years chasing scale — a race that produced systems capable of writing software, generating media, and reasoning through increasingly sophisticated tasks. But as enterprises moved AI agents into production, another reality surfaced beneath the excitement: intelligence can become extraordinarily expensive.

Autonomous agents do not simply answer questions. They plan, retry, call tools, and reason across multi-step workflows, and every one of those actions consumes tokens. What looks impressive on a benchmark can become punishing economics once deployed across thousands or millions of enterprise tasks.

Ant Group, the Chinese technology giant behind Alipay and Ant Ling models, believes that growing tension may become AI’s defining challenge. While much of the industry still competes on frontier rankings, the company believes the next phase of AI competition will center on the cost of thinking.

“When we talk about the ‘token bill’ challenge, what we are fundamentally discussing is the Marginal Cost of Intelligence,” Zhou Jun, vice president of Ant Group and head of the Ant Ling Foundation Model team, tells AI Journal. “There is a prevailing misconception that brute-forcing problems, simply piling on more parameters and throwing massive amounts of raw computing power at a task, is the ultimate solution. We see things differently.”

For Zhou, the reckoning is already arriving, as he claims that architectural efficiency will be the single largest strategic lever for enterprises over the next three to five years. “The era of reaping easy dividends just by scaling up parameter sizes is rapidly fading.”

The Compute Arms Race Is Running Into Economics

The AI industry still rewards scale as frontier labs continue to expand parameter counts and compute budgets, treating size as a proxy for progress. But Zhou argues enterprises have inherited a damaging mindset from that culture. He notes that the true bottlenecks ahead will be data utilization efficiency and inference density. That distinction also shapes the central thesis of Ant Ling, the company’s foundation-model initiative.

“When you ask about sustainable competitive advantage, the answer isn’t a simplistic, single-dimensional race to be just ‘cheaper’ or theoretically ‘smarter,'” Zhou says. “Instead, we believe the ultimate metric the industry must focus on is what we call ‘cognitive density per unit token.'”

The economics are concrete. A model that reaches the same answer with dramatically fewer computational steps changes the enterprise equation entirely. “A highly optimized model capable of executing complex, multi-step reasoning within 100 tokens offers a vastly superior long-term ROI compared to a bloated, clunky model that burns through 1,000 tokens to reach the exact same conclusion. It is about precision and the density of thought, not just the sheer volume of output,” Zhou explained.

Planning loops, tool calls, and repeated inference generate operational costs that scale faster than organizations anticipate. Ant Ling positions itself directly against that dynamic. “The real breakthrough lies in teaching models to be ‘stingy’ with compute while maintaining peak performance,” Zhou says. “By ensuring that every ounce of computational power is spent exactly where it matters most, enterprises can break free from endlessly escalating costs.”

Need of the Hour Is More Intelligence Per Dollar

Ant Ling has structured its Ling model family around what it calls a ‘Value Bet’. Its latest releases — including Ling-2.6-flash and the broader Ring reasoning family — prioritize token efficiency, inference speed, and agent workflows over raw parameter count. The architecture uses a sparse Mixture-of-Experts design, activating only a fraction of total parameters during inference while preserving reasoning capability and throughput. According to Ant Ling and referenced evaluations, Ling-2.6-flash reduced inference costs substantially relative to competing systems while maintaining high speed and lower latency.

Zhou does not dismiss benchmark rankings, but believes enterprises increasingly misread what those scores actually measure. “Achieving high scores on static leaderboards is simply the price of admission today,” he says. “Those benchmarks do a great job of validating a model’s foundational knowledge base, but they often mask the unpredictable, messy complexity of real-world enterprise environments.”

Ant Ling asks buyers to evaluate three hidden metrics: multi-step reliability, tolerance for dirty data, and the slope of the inference cost curve. “Real-world data is rarely clean,” Zhou says. “We place a much higher premium on a model’s ‘anti-interference’ capabilities — how well it extracts truth and reasoning from dirty data under pressure — than its ability to answer trivia based on pristine, perfectly formatted Wikipedia articles.” Peak performance, he argues, matters less than whether a model scales intelligently. “Ultimately, we believe the definitive standard for evaluating an agentic model should not be a theoretical IQ score. It should be the comprehensive cost of successfully completing a task on the first attempt.”

What ‘Elephant Alpha’ Revealed About Developers

Ant Ling’s most revealing experiment happened outside a research lab. Before officially launching Ling-2.6-flash, Ant Group quietly deployed the model on OpenRouter under the codename Elephant Alpha. The experience produced what the company describes as a “deeply counter-intuitive realization.”

“Developers aren’t actually looking for an ‘omniscient’ AI that tries to be everything to everyone,” Zhou says. “Instead, they crave an AI with a strong ‘sense of boundaries.'” The gap between internal evaluation and real-world judgment turned out to be significant. “In a lab setting, internal evaluations naturally index heavily on testing a model’s ‘ceiling’. The blind test exposed just how rigorously developers judge a model’s ‘floor.'”

Reliability and responsiveness, the team found, mattered more than dazzling demonstrations. “If a developer asks for a simple, functional code completion and the model decides to generate a winding, philosophical preamble before outputting the actual syntax, it gets abandoned instantly,” Zhou says. “They want raw utility, not a conversation.”

The experience triggered an internal sprint around what Ant now calls agile execution, and cemented a forward philosophy Zhou describes as “Build in Public, Testing in Stealth Mode.” Through its research arm, InclusionAI, Ant Group releases models, technical reports, and training recipes publicly — framing openness as a strategic accelerator rather than ideology. Zhou is direct about the motivation.

“Trying to solve AGI in a corporate silo is fundamentally problematic,” he says. “Building open-weight models is not ‘charity merely for PR’; it is a highly calculated, strategic accelerator.”

He argues the economics of AI are shifting in ways many observers have missed. “As foundational models inevitably move toward becoming open, inclusive infrastructure, the true commercial value is simply shifting higher up the technology stack. The foundational model becomes the ‘kernel,'” Zhou says. “And the business value lies in how you integrate and distribute it.”

Architecture Is the Skeleton, Iteration Makes It Alive

Ant Ling’s model family combines a hybrid linear Mixture-of-Experts design, Multi-head Latent Attention, and trillion-parameter reasoning systems built for agentic workflows. But Zhou is disarmingly candid about how long any architectural edge actually lasts.

“Pure architectural moats are incredibly short-lived,” Zhou says. “While our hybrid linear MoE and MLA designs give us an edge today, we are clearly seeing best practices diffuse across the industry extremely quickly.”

Long-term differentiation, he argues, comes from accumulated operational expertise — the kind rarely visible in research papers. “The true barrier to entry lies in massive amounts of ‘Tacit Knowledge,'” Zhou says, pointing to GPU orchestration, multimodal data preparation, and large-scale routing efficiency as examples. “Architecture is only the skeleton, only iterations will make it alive.” His ambition, ultimately, is organizational rather than technical. “What we are building is ‘State-of-the-Formula’ — a proven, relentless organizational rhythm for innovation that endures long after any single architecture fades.”

If Zhou is right, the next AI race will be won not by those who build the most powerful models, but by those who build the ‘most economical thinkers’.

Author

Victor Dey

Victor Dey is a tech analyst and writer who covers AI, data science, startups, and cybersecurity. A former AI editor at VentureBeat, his work also appears in New York Observer, Fast Company, Entrepreneur Magazine, HackerNoon, and more. Victor has mentored student founders at accelerator programs at leading universities including the University of Oxford and the University of Southern California, and holds a Master's degree in data science and analytics.

View all posts

Victor Dey 27 May 2026

5 minutes read

Ant Ling Says AI’s Most Dangerous Emerging Problem Is the Cost of Thinking

Ant Ling's AI chief Zhou Jun says token costs are becoming the silent killer of enterprise AI deployment, and the industry's next battle will center on the economics of reasoning.

The Compute Arms Race Is Running Into Economics

Need of the Hour Is More Intelligence Per Dollar

What ‘Elephant Alpha’ Revealed About Developers

Architecture Is the Skeleton, Iteration Makes It Alive

Author

The Compute Arms Race Is Running Into Economics

Need of the Hour Is More Intelligence Per Dollar

What ‘Elephant Alpha’ Revealed About Developers

Architecture Is the Skeleton, Iteration Makes It Alive

Author

Related Articles

The high-stakes shift: why the key to lasting ROI lies in AI foundations

Five myths about older workers and AI

Top Flutter Development Companies for Cross-Platform App Development in 2026

From Hype to Impact: A Practitioner’s Guide to Agentic AI and Modular Integration