The shift in the AI landscape – from the fallout of DeepSeek to new US regulations – has left leaders unsure where to begin. Around two-thirds of executives say generative AI adoption has led to tension and division. Moments of disruption require clarity. Ensuring the safety and oversight of AI models is now more urgent than ever.

AI models are powerful but unpredictable

A recent complaint against ChatGPT for falsely claiming a man killed his children demonstrates that generative AI still hallucinates, producing false or misleading content. Models replicate biases from the internet data they’re trained on, introducing unforeseen results. With GPT-4’s 1.8 trillion parameters, its outputs are virtually untraceable – a true black box.

If you don’t control your models, they control you. AI should support business priorities, not replace them. Nor should executives blindly chase every possible AI use case. Real value comes from embedding AI into decision loops as an operational tool for repeatable, data-rich tasks – not static analysis.

Start by designing AI around desired business outcomes and the decisions that drive them. Build it into processes led by people to foster trust and empower action. Then, connect individual solutions into a system that transforms the whole process end to end.

Organisation-wide AI governance

Strong safeguards are essential as these tools become embedded deeper into decision-making. Organisations need clear AI policies and guidelines that create a framework for responsible use. This should include:

Identifying tasks that require human expertise, creativity, or nuanced judgment which must only be augmented by AI, and those that are repetitive and rule-based that can be fully automated
Building human oversight into systems – or above them when it’s not possible
Regular audits and ongoing evaluation to ensure ethical and operational compliance

Prioritising AI literacy strengthens this framework, helping users understand how AI works, its limitations, and its reasoning. Only then can they judge when to rely on AI and when to defer to human judgement.

Achieving accuracy and safety with RAG and guardrails

Fine-tuning large language models on domain-specific data reduces hallucinations but is costly, inflexible, and models can still fabricate responses to unfamiliar input. A more effective approach combines Retrieval-Augmented Generation (RAG) with moderation guardrails.

RAG improves the accuracy of outputs by retrieving relevant, real-time information from external sources to generate domain-specific responses. This reduces hallucinations and supports more informed, contextually relevant outputs than pre-training alone. The ability to cite sources increases transparency and user trust.

Moderation guardrails apply safety and policy checks to the content RAG produces, scanning both retrieved data and generated responses for misinformation and violations.

The need for explainability

Humans must evaluate how and why models reach their conclusions – especially in high-stakes areas – to ensure fairness, privacy, and cause-and-effect understanding.

Just as autopilot assists but doesn’t replace a pilot, AI should support humans: automating routine tasks, surfacing insights, and accelerating decisions.

For pre-trained models, post-hoc interpretability methods are often the only option. These explain individual predictions by:

Analysing how input changes affect the model
Testing different inputs and fitting a simple model to mimic local behaviour
Reverse propagation to find influential input features

But these are computationally expensive and often inconsistent, as it’s difficult to define “local” in complex data and each case is analysed in isolation.

Instead, self-explaining models should be designed with interpretability built into their architecture. These models meet three key criteria:

Explicitness – showing clearly how a decision was made
Faithfulness – accurately reflecting internal reasoning
Stability – ensuring similar inputs yield similar explanations

Even in high-volume or time-sensitive use cases where AI operates autonomously, such as fraud detection, interpretability matters. Autonomy must remain within clearly defined, human-set boundaries – supported by strong safeguards and regular audits. Self-explaining models allow easy validation, even when complexity is high.

Making AI work – safely and strategically

As AI becomes integrated into operations, control is non-negotiable. Without clear oversight, regular audits, and strong governance, we risk building systems we can no longer steer or validate.

This means aligning technical capabilities with business goals and ensuring human accountability. It requires robust governance frameworks that drive effectiveness while staying resilient amid evolving regulatory landscapes.

If we want AI to serve our goals, we need to stay firmly in the driving seat.

Author

AIJ Guest Post

View all posts

AIJ Guest Post 24 April 2025

3 minutes read

‘Control or Be Controlled’: The Guiding Principles for AI Model Safety in 2025

By Dr Marc Warner, CEO at Faculty

AI models are powerful but unpredictable

Organisation-wide AI governance

Achieving accuracy and safety with RAG and guardrails

The need for explainability

Making AI work – safely and strategically

Author

AI models are powerful but unpredictable

Organisation-wide AI governance

Achieving accuracy and safety with RAG and guardrails

The need for explainability

Making AI work – safely and strategically

Author

Related Articles

Artificial Intelligence and SEO: How AI Is Changing the Future of Digital Promotion

A Practical Path to AI Adoption in Legacy Systems

Alex Yancher: AI are the first responders, humans are the problem-solvers

The Impact and Potential of Tech in the Nonprofit Space