
AI-powered analytics platforms have transformed how digital asset markets are read, promising to turn on-chain data into actionable intelligence at superhuman speed. The speed is real. The interpretation often is not.
Murtuza Merchant is blockchain intelligence lead at Yellow, where he has built analytics frameworks reaching over a million users in their first 90 days. A long-time regulatory and market-structure analyst for Benzinga, Cointelegraph, and Decrypt, he sits at the intersection of AI-driven analytics, on-chain market structure, and the regulatory frameworks now reshaping the industry.
In this conversation, he explains why on-chain data is not the truth layer the market treats it as, what AI confidently gets wrong, and what would have to change for the next generation of intelligence tools to deliver on their promise.
AI-powered analytics platforms have exploded over the past 18 months, all promising to turn on-chain data into actionable intelligence at superhuman speed. From your work covering this space, where are these tools genuinely outperforming human analysts, and where are they confidently wrong?
The speed advantage is real. Crypto markets generate terabytes of data daily across dozens of blockchains, hundreds of exchanges, and thousands of liquidity pools. No human analyst can process even one of those categories comprehensively in a working day. AI handles all of them at once and surfaces anomalies in near real time. Anyone still working manually across these feeds is operating with a serious handicap.
Where these tools are confidently wrong is the interpretive layer. Pattern detection is not pattern understanding. A platform can tell you that a cluster of wallets just moved a large amount of Bitcoin to an exchange and that this pattern has historically preceded a price decline 70 % of the time. What it cannot tell you is whether this specific instance falls in the 70 or the 30. That depends on context the model does not have. Is this an exchange consolidating cold storage wallets for routine security upgrades, which now accounts for 30 to 40 % of all whale alerts? Is it the on-chain settlement leg of an OTC deal negotiated days earlier? Is there a regulatory announcement expected this week that changes the calculus entirely?
There was a documented case in 2025 where an AI-driven trading system lost 80 % of its capital in a single week because the volatility regime shifted abruptly. The system kept applying old patterns with full confidence, oblivious to the fact that the underlying dynamics had changed. That is not a flaw in any specific platform. It is a structural limitation. Crypto markets change rules constantly through protocol upgrades, regulatory shifts, and macro shocks. A model trained on one regime will produce confident wrong answers in another, and it has no internal mechanism for recognising the transition has happened.
Crypto treats on-chain data as a kind of ultimate truth layer: transparent, verifiable, immutable. When AI is trained on it, the model inherits that assumption of objectivity. What does the data actually miss?
It misses intent, and intent is the only thing that actually matters when you are reading a market.
Every transaction on a blockchain is factually accurate. The transfer happened, the amount is real, the addresses are verifiable. But transparency of transactions is not transparency of purpose. When 5,000 Bitcoin moves between wallets, the on-chain record tells you the amount, the timestamp, and the addresses. It does not tell you why. Was a fund unwinding a position? Was an exchange restructuring its internal wallet architecture? Was it the on-chain leg of a private OTC deal? Each scenario produces an identical footprint. The data is the same. The meaning is completely different in every case.
Earlier this year, what the entire market had been reading as aggressive Bitcoin whale accumulation turned out to be exchange wallet consolidation. Major exchanges periodically move funds from thousands of smaller deposit addresses into fewer large cold storage wallets. On every standard analytics dashboard, that looked indistinguishable from a massive institutional buyer loading up. Traders who acted on the signal got burned.
Beyond intent, on-chain data misses the entire off-chain universe that actually shapes market outcomes. Stablecoin reserve composition. Banking relationships of issuers. Derivatives positioning, which in Bitcoin and Ethereum is often several multiples larger than spot volume. Regulatory enforcement actions being prepared behind closed doors. Institutional positioning through OTC desks that never touches a public blockchain until final settlement.
When an AI model is trained exclusively on on-chain data, it inherits the assumption that what is visible on the blockchain represents the full picture. It does not. The blockchain shows you what happened. The who, the why, and the what-comes-next all live somewhere else.
Can you give a specific example, a market event, an incident, a moment, where AI-driven on-chain analytics gave a confident answer that turned out to be wrong? What did the model see, and what did it miss?
The late November and early December 2025 episode is the cleanest example because it played out in public.
Crypto Twitter exploded with warnings that $7.5 billion in whale inflows had hit Binance over a 30-day period. That was the highest level since March 2025, when similar inflows preceded a 30 % crash. Every on-chain dashboard flagged it as bearish. AI-powered platforms generated confident bearish readings. Retail traders panicked and sold.
The actual picture was the opposite. The accumulation indicators simultaneously printed at one of the highest readings since the 2024 peak. Whales were not distributing. They were aggressively accumulating. The exchange inflows were a mix of OTC settlement, treasury rebalancing, and liquidity positioning for derivatives trading. The directional intent was the exact opposite of what the raw inflow number suggested.
Every contextual signal confirmed it. Bitcoin had already corrected 15 % from its December peak. Sentiment had collapsed into extreme fear. Mid-tier whale wallets holding 100 to 1,000 Bitcoin increased their holdings by 0.47 % in two weeks. Retail wallets under 0.1 Bitcoin were capitulating. This was a textbook contrarian setup of whales accumulating during fear while retail panicked.
The models saw a historical pattern match. Large Binance inflows followed by decline. What they missed was every contextual factor that made this instance structurally different from the historical comparison. The cycle phase was different. The sentiment regime was different. The wallet-tier behaviour was different. None of that registered in models trained to react to flow numbers in isolation.
Social media narratives, influencer calls, paid promotions dressed as analysis, that noise is now part of the AI training diet because models scrape it. How does AI perform at separating manufactured hype from real signal, and what is the failure mode when it gets it wrong?
AI is reasonably good at catching the crude stuff and dangerously bad at catching the sophisticated stuff. And the sophisticated stuff causes most of the damage.
The crude layer is coordinated bot activity. If 10,000 accounts post the same phrase about a token within a 30-minute window, decent models flag it as astroturfing. Basic spam detection has improved. But adversaries have evolved faster than the filters.
Scam operators now use AI themselves to generate thousands of unique, natural-sounding social media profiles posting varied, human-looking content. Deepfake-related financial fraud increased 340 % in 2025 and 2026, with crypto scams the largest category. AI-enabled scam operations are roughly 500 % more profitable than traditional schemes. Deepfake videos of exchange CEOs and public figures endorsing fake platforms have become so realistic that visual inspection alone cannot distinguish them from genuine content.
The failure mode is circular. An AI sentiment model scrapes social media and concludes that a token has overwhelmingly positive engagement. That bullish reading gets surfaced in dashboards, which attracts more retail buyers, which generates more genuinely positive sentiment, which reinforces the model’s original assessment. By the time the developers dump their supply, the AI has been confidently bullish the entire way up because it was measuring noise volume rather than signal quality.
A meme coin earlier this month surged over 6,000 % in under a month, from roughly 25 cents to nearly $28. On-chain volume looked organic. Social sentiment looked organic. It turned out to be a coordinated pump and dump with extreme wallet concentration. By the time the manipulation was exposed, the price had collapsed 95 %. Over 62 % of meme coins launched in 2025 were flagged as potential rug pulls within 30 days, which tells you how systemic the problem has become.
The deeper structural issue is that AI sentiment analysis treats all sources as roughly equivalent inputs. A genuine institutional research note and a paid influencer post both register as positive mentions. The model does not weigh credibility, track record, or financial disclosure. It counts signal volume. In a market where the noise is deliberately engineered to look like a signal, counting volume is exactly the wrong methodology.
At Yellow you have built intelligence frameworks that reached over a million users in 90 days. When you design these systems, where do you deliberately choose not to automate, what stays human, and why?
Three things stay human, and I am very deliberate about why.
The first is the why layer. When the system surfaces a pattern, something like a stablecoin burn spike coinciding with rising derivatives open interest and an unusual wallet clustering anomaly, somebody needs to determine what it actually means. Is it a fund unwinding? A protocol migrating liquidity across chains? A market maker repositioning before an options expiry? The AI identifies the pattern. The human identifies the meaning. Those are fundamentally different cognitive tasks, and the second one requires understanding market microstructure and regulatory context that does not exist in any dataset. Automating it produces faster nonsense.
The second is editorial judgment. Deciding what matters, what is noise, and how to frame a finding for a specific audience. An institutional investor needs a different framing than a protocol team, and both need something different from a regulator. AI can help with speed and structure. But the judgment about what to foreground requires understanding who the reader is and what decision they are trying to make. Automating that produces content that is technically accurate but practically useless.
The third is the uncertainty call. This is the one most people skip. When the system produces a low-confidence signal, or when multiple inputs contradict each other, or when the market is operating in a regime our models have not seen before, somebody needs to say plainly that we do not know. AI systems are architecturally terrible at expressing uncertainty. They are optimised to produce an answer. A responsible operation has to be willing to say the data is ambiguous and here is what we would need to see before forming a view.
The reason the framework reached the scale it did in such a short window is not because it was the fastest. It is because readers came to understand that when we said something, we meant it, and when we did not know, we said so. In a market saturated with confident wrong answers, the willingness to admit uncertainty is the most undervalued competitive advantage.
The idea of autonomous AI agents in DeFi, holding wallets, providing liquidity, executing trades without humans in the loop, is moving from research papers into production. If AI inherits the blind spots you are describing, what happens when the human override is gone?
Every limitation I have described becomes an irreversible error.
In the current model, AI surfaces patterns and humans interpret them. The worst case when the AI is wrong is that somebody publishes a bad take or makes a suboptimal trade. Recoverable. The autonomous agent model removes that buffer. We are heading toward a world where agents transact on programmable payment rails, where protocols are being built for high-frequency microtransaction settlement between autonomous systems, and where non-human identities in financial services already outnumber human employees 96 to 1.
Imagine the scenarios I have been describing playing out at machine speed without human intervention. An autonomous trading agent sees what looks like whale accumulation and buys aggressively. It was an exchange wallet consolidation. An AI liquidity provider deploys capital into a pool that looks healthy on-chain but is associated with a token under active regulatory enforcement. An agentic system interprets a stablecoin mint as bullish demand when it is actually a treasury operation with no market intent. A liquidity bot enters a pool just before the underlying protocol gets exploited.
In each case the agent acts on the data as presented, with real capital, and the settlement is final. There is no compliance officer pulling the trade back. No risk manager overriding the position. No phone call to verify whether the signal makes sense. The transaction is done, on-chain, irreversible.
The infrastructure that would give autonomous agents the contextual judgment to avoid these errors does not exist yet. The identity layer is missing. The credentialing system is missing. The contextual intelligence layer is missing. Yet production deployments are happening now. The gap between the speed of deployment and the maturity of the safeguards is one of the most underappreciated risks in the entire digital asset ecosystem.
Regulators in the UK, EU, and US are now actively buying on-chain analytics tools, Chainalysis, TRM Labs, Elliptic. Should we be worried that the same interpretation traps you have described are being baked into enforcement decisions?
Yes, and the stakes are materially different from a trader getting a signal wrong.
These tools have genuine value. They have helped recover or freeze tens of billions of dollars in illicit crypto and supported investigations across dozens of blockchains. The 2026 figures show $158 billion in incoming value to illicit wallets in 2025, up sharply from 64.5 billion the year before. The technology is essential for tracking fund flows and mapping criminal networks.
But there is a gap between fund tracing and intent determination, and that gap matters enormously in enforcement. When the US Treasury sanctioned UK-registered crypto exchanges earlier this year for facilitating IRGC-linked transactions, the case was built partly on on-chain flow analysis. Most Iran-linked crypto flows actually originate from ordinary retail users trying to preserve savings as the rial weakens. The on-chain footprints of a sanctioned entity moving illicit funds and an ordinary citizen trying to survive a currency crisis can look remarkably similar. Distinguishing them requires human judgment that the analytics tools cannot fully provide.
The false positive problem in enforcement is qualitatively different from the trading equivalent. A trader who acts on a bad signal loses money. A regulator who acts on a bad signal can freeze legitimate assets, shut down a compliant business, or prosecute someone whose on-chain activity was operationally routine. MiCA now requires reporting of transactions above 15,000 euros across the EU. The volume of alerts that surveillance regimes will generate is enormous. Without a robust interpretive layer, regulators risk either drowning in noise and failing to act on genuine threats, or over-responding to false positives and damaging legitimate participants.
The constructive answer is not to abandon these tools. They are essential. But regulators need to invest equally in the human interpretive infrastructure that makes automated surveillance meaningful.
Crypto markets run 24/7, are globally fragmented, and adversaries adapt faster than retraining cycles. Compared to traditional finance, what is the one structural feature that makes crypto specifically harder for AI to interpret reliably?
Pseudonymity at scale. That is the single feature that separates crypto from every other financial market and that no amount of AI sophistication can fully overcome.
In traditional finance you know that a 13F filing came from a specific named institution. You can contextualise their moves based on known strategy, regulatory obligations, historical behaviour, and public statements. The identity layer is baked into the infrastructure. Every market participant above a certain threshold is required to disclose who they are.
In crypto, you see that a wallet moved $50 million of Ethereum and you often have no idea whether that is a hedge fund, an exchange hot wallet, a protocol treasury, a market maker, a money launderer, or an individual. Entity labelling has improved considerably with platforms tracking hundreds of millions of wallet labels through AI classification. But a significant %age of large wallets remain unidentified, because the total number of active addresses across major blockchains continues to expand faster than labelling can keep up.
Without knowing who, the why becomes exponentially harder to determine. The same on-chain pattern can mean completely different things depending on who executed it. An exchange moving 5,000 Bitcoin between internal wallets is operationally meaningless. A hedge fund moving 5,000 Bitcoin to an exchange is potentially a major sell signal. A protocol treasury moving 5,000 Bitcoin as part of a planned diversification is strategically important but not bearish. The on-chain footprint is identical. The meaning is completely different. Most AI models, in most implementations, do not know which scenario they are looking at.
Traditional finance solved the identity problem decades ago through mandatory disclosure. Crypto has chosen, by design, not to solve it the same way. That commitment to pseudonymity creates an interpretation gap that no model can fully close, because the missing information is not on the blockchain and was never intended to be.
If you were building the next-generation AI intelligence platform for digital assets from scratch today, what is the one thing you would do differently from every existing tool on the market?
I would build the uncertainty layer first and the signal layer second.
Every analytics product on the market today is designed to maximise the appearance of insight. Clean dashboards, confident scores, directional arrows, simple buy or sell classifications. The market wants certainty, so that is what gets built. But certainty in a structurally uncertain environment is not a feature. It is a liability. It gives users false confidence, which leads to losses, which erodes trust, which kills the product over time.
If I were building from scratch, I would invert the design philosophy. Before telling the user what the data shows, the platform would tell them what the data cannot show. Every signal would come with an explicit confidence interval. Every pattern match would disclose how many times the historical comparison held versus how many times it failed. The system would flag explicitly when it is operating in a market regime its training data has not encountered before, rather than silently applying old patterns to new conditions.
I would also integrate a context deficit indicator. When the platform surfaces an on-chain signal but lacks the off-chain context needed to interpret it, the tool would say so explicitly. The output would not be a bullish accumulation alert. It would be something closer to a large wallet inflow detected, confidence is moderate, we cannot determine whether this is accumulation, exchange rebalancing, or OTC settlement without additional context, and here is what you would need to check.
The harder truth is that designing for uncertainty is harder to sell than designing for confidence. Nobody wants to launch a dashboard that says we do not know. But that is the platform that builds durable trust. The market eventually figures out which tools were honest about their limits and which were not. In a market this fast-moving, trust is the only competitive advantage that does not depreciate.
Looking 12 to 18 months out, what is the development you are watching most closely, and what would have to be true for you to say AI in this space has actually matured?
The development I am watching most closely is the regulatory response to autonomous AI agents operating on financial infrastructure. Specifically, how the FCA in the UK, MiCA enforcement in Europe, and whatever framework emerges from the GENIUS Act in the US will handle the question of accountability when an AI agent causes financial harm on-chain.
We have a regulatory gap that is about to be tested. Stablecoins are being legislated. Exchanges are being licensed. Reporting thresholds are being lowered. But the question of what happens when an autonomous AI agent, acting without direct human instruction, executes a transaction that violates a sanctions regime, destabilises a liquidity pool, or front-runs a governance vote has not been answered by any major jurisdiction. Who is liable when an agent acts on bad data? The developer who built it? The protocol that hosted it? The user who deployed it? The answer to that question will shape the entire trajectory of agentic finance.
For me to say AI in crypto intelligence has actually matured, three things would need to be true simultaneously.
First, leading platforms would need to integrate on-chain, derivatives, regulatory, and off-chain context data into a single analytical layer rather than treating them as separate products. Today they are siloed, and every tool is giving users a partial picture while presenting it as the whole.
Second, platforms would need to communicate uncertainty as a first-class feature rather than hiding it behind confident scores. The day a major analytics dashboard says we do not have enough context to interpret this signal, rather than generating a bullish or bearish reading anyway, is the day I will believe the industry has turned a corner.
Third, there would need to be a meaningful improvement in entity attribution. Not just wallet labelling, but understanding the operational behaviour of the entities behind the wallets. Knowing that a wallet belongs to an exchange is useful. Knowing that the exchange is currently restructuring its cold storage architecture, which means the next three weeks of outflow data will be misleading, is the level of contextual intelligence that actually prevents bad decisions.
If those three things converge in the next 12 to 18 months, I will say the field has matured. If they do not, what we will have is faster, more confident wrongness operating at greater scale than ever before. And the consequences of that, particularly as autonomous agents start handling real capital on-chain, will be considerably more serious than anything we have seen so far.
