Enterprise AI

Why the Post-Transformer Era Is Already Here — and What It Means for Enterprise AI

By Zuzanna Stamirowska, CEO, Pathway

The LLM chatbot undoubtedly changed the world, but just over three years since ChatGPT’s launch, something is amiss. Beneath the mounting frustration around GenAI ROI lies a bigger story that most aren’t naming directly. A growing body of influential AI researchers, developers and executives are thinking beyond the transformer architecture that’s underpinned the LLM revolution to date. 

From building ‘World Models’ to pursuing biologically-inspired transformer alternatives, a new generation of labs is daring to pioneer the future of LLM architecture with new paradigms. The shift towards a post-transformer LLM is underway, with serious implications for the future of AI and enterprise technology. 

One frustration too many 

This moment has been long building. Enterprises have heavily invested in LLMs, but many are yet to see major returns. This was evidenced most strikingly by the now infamous MIT study  from last year concluding that 95% of generative AI pilots fail to deliver measurable ROI. For enterprises that have bet heavily on LLMs to transform operations, that number isn’t just a data point — it’s a diagnosis. 

This won’t be a surprise for those who have direct experience overseeing the rollout of LLMs in an enterprise. Today’s LLMs approach tasks in the same state every time, regardless of past experience, like Groundhog Day. This has remained true despite efforts to fine-tune models with first-party data to capture business-specific logic and context. The result is that LLMs have proved to be of limited usefulness in enterprises to date and have been scaled down to niche use cases managed with bolted-on fixes that address symptoms rather than the underlying problem. It hardly amounts to transformative change. 

This state of affairs can be confidently attributed to the transformer architecture, which predicts language patterns in a statistical manner to generate outputs. Transformer-based LLMs excel at pattern recognition, but they’re mathematically deprived of memory and time concepts. That brittleness – the inability to learn continuously or reason across long horizons – isn’t a bug to be patched. It’s baked into the architecture that has hit its limits. 

Rather than address this, the efforts of AI companies to improve LLMs have focused purely on scale. Hopes have been pinned on using ever-increasing amounts of training data and token usage in ‘reasoning’ scenarios as the route to more intelligent AI models. The growing consensus across academia and the AI developer ecosystem is that this isn’t working. NYU Computer Science Chair, Martin Farach-Colton, and professor, Julian Togeluis, recently stressed the limits of the transformer; catastrophic collapsing when models are pushed beyond their planning horizons and fundamental limits of current ‘memory’ workarounds that don’t influence actual model weights. In a similar vein, Llion Jones (a co-author of the seminal “Attention Is All You Need” paper that introduced the Transformer) has said he’s “absolutely sick” of incremental transformer improvements and today spends his time developing neural network architectures at Sakana AI. Calling out incremental improvements is right. Major labs have spent years and billions attempting to push transformer-based models further, with diminishing results. 

For LLMs to realize their promise, we need mechanisms for models to learn continuously, reason for longer without hallucinations, and offer far greater observability. All of this needs to be accomplished more effectively to avoid the AI energy crunch. The conclusion is unavoidable: incremental fixes to the transformer aren’t the answer. The architecture itself needs replacing. 

Towards generalization over time 

In emerging frontier AI models, we see alternatives for LLMs taking shape. In October, we unveiled the Dragon Hatchling (BDH) architecture, the first model that continuously learns. The first enterprise deployments are already showing something we haven’t seen before: models learning from experience without retraining, adapting in real time, and beginning to generalize. That’s a fundamental break, not an incremental one. This is possible through a completely different approach to LLM architecture that takes inspiration from the human brain. In the brain, memory is developed through billions of neurons connecting to each other with synapses. The Hebbian learning principle tells us “neurons that fire together, wire together” to describe the process happening in the brain for learning and memory formation. 

By replicating how neurons and synapses function in the brain, BDH activates only the artificial neurons relevant to each reasoning step as new tokens enter the model — a complete break from drawing on static, pre-trained weights. Critically, synapses strengthen or weaken as the model works, leading the path to better long-horizon reasoning and eventually to genuine generalization over time rather than interpolation between training examples. 

Enterprise dividends 

LLMs that can replicate advanced cognitive functions can naturally be put to more tasks in an enterprise environment, as restrictions around the limits of context windows and fine-tuning are less of a concern. But the benefits stretch beyond this. For example, it’s been notoriously difficult to decipher the ‘thinking process’ behind the output of LLMs to date. When models behave like a population of connected, cooperating neurons, it’s possible to observe which neurons are activating and which synapses are changing as a result of processing information. Today’s transformer interpretability tools offer MRI-level insight: you can see something happened, but you can’t understand why. This isn’t good enough in heavily regulated industries with deterministic workflows. Here there’s a need for ‘CCTV inside the brain’ of models: full visibility into the decision logic. And BDH finally offers that level of visibility.    

Alternative architectures are less compute-hungry and energy-intensive than transformers. When models can learn continuously, they build genuine domain expertise over time — eliminating the costly cycles of retraining and fine-tuning that make enterprise AI so operationally expensive. Scaling models no longer has to come with the trade-off of much higher operational costs through compute and energy demands. And because BDH builds knowledge from experience rather than requiring massive pre-loaded datasets, enterprises can now stand up AI for use cases where the training data was previously too scarce to support a transformer-based deployment. It’s a cost-effective and sustainable way to run LLMs in real-time with embedded contexts and all new use case opportunities. 

The welcome new era 

The transformer was never built with the enterprise in mind. Transformer-based LLMs are approaching a ceiling despite any upward scale of training data. That’s why the move to post-transformer architecture isn’t a question of if it’s when. We’re not optimizing yesterday’s technology. We’re building what comes after it. The unrealized potential of AI gets unlocked when models can actually remember, learn, and reason. Continuous learning puts LLMs on the expedited path to AGI. For enterprises still waiting for AI to deliver on its promise, the architecture that makes it possible already exists.  

 

Author

Related Articles

Back to top button