Large Language Models (LLMs) have now become widely adopted across industries – and financial services is no exception. From personalizing customer experiences to enhancing compliance and surveillance, these technologies offer new levels of efficiency and insight. But alongside this potential comes a unique set of risks and responsibilities, particularly in a sector governed by complex regulation and high levels of public scrutiny.

LLMs show strong potential to enhance compliance and business applications far beyond customer service and document processing. Yet doing so safely, responsibly, and at scale requires a deep understanding of the regulatory landscape, a clear grasp of the technology, and concerted investment in both tools and talent.

Regulation and Risk

The regulatory landscape for AI in financial services is evolving rapidly, with different jurisdictions adopting different approaches. In the U.S., guidance such as SR 11-7 addresses model risk, while the EU’s AI Act introduces risk-based tiers that classify and govern AI applications. Rather than seeking zero risk, regulators expect firms to apply appropriate controls that scale with the potential impact of failure.

Regulatory expectations will differ significantly depending on use case. AI-generated outputs aimed at retail investors – such as marketing materials or disclosures – attract far greater scrutiny than internal systems for risk detection or trade surveillance due to the potential for investor or consumer harm should things go wrong. Nonetheless, financial institutions that undertake robust due diligence and work with responsible vendors will be best positioned to harness the capabilities of LLMs while staying compliant.

The most forward-thinking financial institutions are those integrating AI compliance into unified governance frameworks, treating LLMs not as experimental tools, but as critical systems subject to the same standards as traditional technologies. This approach allows firms to innovate, but in a way that balances risk and reward with accountability.

Beyond the Obvious

While much attention has focused on customer service chatbots and document automation, some of the most transformative applications of LLMs are emerging in less visible areas. Communications surveillance is one such example. LLMs can detect subtle patterns of misconduct that rule-based systems often miss – such as shifts in tone or euphemistic language – offering a more adaptive, accurate, and context-aware approach to managing conduct risk.

This deeper contextual awareness Is helped by training LLMs on datasets that go beyond financial datasets alone, meaning these models can assess inputs based on a richer, cross-domain data pool and provide deeper analysis that simply isn’t possible with older or smaller models.

We’ve also seen substantial progress in the ability of LLMs to interpret compliance rules and transcribe and analyze complex communications, whether via voice or text, working accurately across multiple languages and making sense of trading jargon that might be used in an attempt to hide misconduct. This creates new opportunities for more nuanced, scalable surveillance technologies that can keep pace with evolving regulatory requirements and communication styles.

Accuracy, Explainability, and Auditability

To operate within financial regulations, LLMs must be accurate, explainable, and auditable. This means ensuring that every model decision – especially those with compliance implications – can be traced, justified, and inspected.

One effective method is to embed “audit events,” logging, model version control, Chain-of-Thought (CoT), explanation and classification results directly into the workflow and UI. This gives users clear oversight of the processes behind how the LLM works with their data.

Transparency around how models are trained, tested, and validated is essential. But even the most responsible vendor solutions need to be matched with strong in-house governance. Ongoing monitoring, benchmarking, and performance testing are crucial to maintaining model reliability over time.

Human-AI Collaboration

Bias and hallucination – where models produce misleading or factually incorrect outputs – remain two of AI’s most significant challenges. Bias and fairness testing is essential to establish whether bias exists in the system, including factors like racial or ethnic, gender, or religious bias. It is vital to remember that bias can not only be unintentionally introduced in the datasets originally used to train LLMs, but also in other components like surveillance prompts and risk definitions. Monitoring for things like performance drift and bias metrics is a continuous process that must be maintained during production and throughout model use.

Prompt engineering also plays a critical role. The structure and clarity of prompts directly influence the quality and reliability of responses. Financial institutions should involve both AI experts and compliance professionals in defining those prompts, ensuring the model is aligned with regulatory intent and risk thresholds.

Ultimately, human oversight remains essential. LLMs should augment, not replace, expert judgement – particularly in high-stakes environments where contextual nuance and ethical considerations come into play and compliance outcomes may have substantial legal, financial, and reputational impacts.

Scaling Responsibly

Successfully deploying LLMs at scale requires careful consideration of infrastructure. Institutions must weigh the trade-offs between on-premises setups – offering greater control and data security at a higher upfront cost – and cloud-based models that are faster to deploy and more cost-effective in the short term, but introduce concerns around vendor lock-in and data sovereignty.

Just as critical is building the right talent base. Beyond AI engineers and data scientists, firms will increasingly need compliance-savvy technologists and business users who understand how to structure prompts and interpret AI outputs responsibly.

Ultimately, LLMs should not be treated as blunt tools, but as a new system of record – governed, audited, and aligned with the high standards expected in finance. With responsible technology adoption, robust governance, and human oversight, financial institutions can lead the way in shaping a safe, effective, and future-ready approach to AI.

Author

AIJ Guest Post

View all posts

AIJ Guest Post 15 September 2025

4 minutes read

Mind Over Model – Getting LLMs Right in a Regulated Industry

By Yifan Xia, AI Product Director, and Don McElligott, VP of Compliance Supervision, Global Relay

Regulation and Risk

Beyond the Obvious

Accuracy, Explainability, and Auditability

Human-AI Collaboration

Scaling Responsibly

Author

Regulation and Risk

Beyond the Obvious

Accuracy, Explainability, and Auditability

Human-AI Collaboration

Scaling Responsibly

Author

Related Articles

Why Your Brand Needs an AI Personality Strategy (Not Just a Chatbot)

Maximizing ROI with the Best Google Ads Management Services in 2025

How AI is Changing Travel Planning

The next user interface is no interface