
Having worked in data lineage for nearly two decades, I am still struck by how casual some organisations can be in their approach to data management. Beneath the surface of some financial institutions lies a sprawl of disorganised data, accumulated over years of ‘out of sight, out of mind’ habits.
Too often, data lineage is treated as the financial industry’s deus ex machina, appearing just in time to rescue compliance efforts as deadlines loom. To a degree, this works. Fine‑grain data lineage can untangle even the most complex data challenges with impressive speed, allowing organisations to adhere to increasingly stringent data regulators.
But the landscape is shifting, and as AI introduces a new and unavoidable inflection point. The finance industry, like every other sector, now finds itself standing at a fork in the road. The potential for faster decisions, deeper market insights, and greater operational efficiency is undeniable. Yet these advances depend on something far less glamorous: clean, structured, and trustworthy upstream data.
This is where many institutions falter. In financial services, 90% of AI model failures trace back not to the algorithm, but to upstream data changes, and so the promise of AI cannot be separated from the quality of the data that feeds it. Without order, predictability, and proper governance, even the most advanced tools will produce unreliable outcomes.
There are now two diverging paths ahead of financial institutions. One will lead organisations to charge ahead in pursuit of early advantage, only to find themselves tripped up by fragile and inconsistent data foundations. The other path will demand patience, requiring firms to cut through the undergrowth of their data estates before scaling AI ambitions.
In the end, it is this second, less‑travelled, and more deliberate path that will make all the difference. Institutions that choose it will earn trust in the systems they build and the decisions they make.
Hallucinations and why they happen
Open the LLM of your choice and ask it a simple question: “How many ‘s’ are in the word strawberry?” ChatGPT, my willing participant, responded with its usual reassuring certainty: “There are 3 ‘s’ letters in ‘strawberry.’” This viral experiment captures what is commonly referred to as an AI hallucination.
It is worth noting that this particular quirk is not the result of poor data, but rather a byproduct of how language models process text through tokenisation. Still, it serves as a useful illustration of the fact that when an AI system is not well positioned to answer a question reliably, whether due to how it processes information or the limits of the data available to it, it will often respond with unwavering confidence.
Now, imagine applying for a business loan. It is not hard to envision a near‑future in which the outcome of that application rests, at least in part, on the judgment of a machine that cannot even reliably tell you how many ‘b’s are in “business.”
If a bank’s data becomes corrupted at any point along its lineage, is misrouted, or is incorrectly attributed, AI systems will make decisions based on those flawed inputs. As it stands, they are not yet equipped to consistently recognise when something does not add up, or to pause rather than proceed on uncertain ground.
This is not just a ‘what if,’ though. In financial services, 78% of firms now use AI for analysis, yet hallucination rates on financial‑domain tasks still sit in the 15–25% range when data is shallow or poorly governed. According to the FT, 27% of business users rate hallucinations as their top AI concern, ahead even of job‑loss anxiety.
These small issues can build up quickly and what begins as a minor discrepancy can quickly cascade into a chain of downstream decisions that only reinforce and amplify the original issue.
The promise of AI is real, and the potential gains are worth pursuing. But before the industry can fully embrace it, there is a responsibility to confront the limitations that remain. Until these systems can better distinguish between what is reliable and what is not, these tools can’t fully be trusted to make intelligent decisions.
Winners and Losers
It stands to reason that the shape of the AI landscape, and its victors, will be determined largely by those who get their house in order and their data lineage optimised.
There is real potential for early adopters of lineage‑guided AI integration to emerge as case studies for the rest of the industry. People are wary of AI, and consumers will need reasons to trust it with something as precious as their finances. Governing bodies too will need proof that organisations are correctly positioned to move to semi‑, or even fully automated AI‑driven decision‑making.
The FCA has made clear that firms remain accountable for outcomes even when decisions are delegated to AI, while Consumer Duty requirements demand that good customer outcomes be demonstrated, not assumed. The EU AI Act, incoming US‑state frameworks, and tightening expectations from the Bank of England all point in the same direction: organisations will increasingly need to prove that their systems are fit for purpose and have processes in place to keep them in good shape year‑round, not just when compliance windows roll around.
In this respect, data lineage can kill two birds with one stone. For regulators, it provides an auditable trail that demonstrates where data originated, how it was transformed, and why it was used in a given decision. For customers, it can provide the clarity and reassurance needed to shift perceptions of AI from something opaque to something that is vetted, and trustworthy.
There is a harder‑edged commercial argument here too. Institutions with mature lineage capabilities can onboard new AI use cases faster, with fewer false starts. They can isolate and correct upstream errors before they cascade. And they can do so with leaner, more purposeful data estates, reducing not only risk, but the computational and storage costs that bloat AI operating expenses. In this sense, data lineage lends itself to sustainability goals and cost‑efficiency.
This is what the future looks like for those prudent enough to embed lineage‑led reasoning into their AI adoption. Not all will follow this path. In fact, Gartner estimates that through 2026, 60% of AI projects will be abandoned if they are not built on AI‑ready data and robust lineage from the outset. Later down the line, as regulation tightens and market expectations rise, retrofitting lineage into sprawling, undocumented data estates will only become more complex and more costly.
On the other hand, there is ample opportunity for a number of winners to emerge from this transition, and institutions that begin now, methodically and deliberately, will be positioned to lead.



