
Generative AI (Gen AI) has astonished the industry with its rapid development and subsequent adoption. The possibilities Gen AI offers in terms of productivity, automation and personalisation are seemingly endless. And now, organisations are exploring “agentic AI” – or AI that integrates with existing systems to make decisions and perform actions autonomously, without humans, further increasing the pace – and raising the stakes.
We can’t afford to be complacent with such a complex, fast-moving technology prone to inaccuracy, including “hallucinations,” as well as bias, privacy concerns and other potential harms. To mitigate Gen AI risks and focus on delivering optimal AI-powered apps and experiences, organisations must prioritise rigorous testing, including expert-led red teaming to uncover vulnerabilities and combat threats, broad data sourcing, and testing with a global, diverse community.
The quality and reliability of Gen AI outputs depend on the quantity and diversity of the data that fuels the large language models (LLMs) on which Gen AI services are based. Organisations need to ensure the data is from trusted sources and built on very broad, diverse datasets to address the risks. This can be achieved by testing real-world scenarios with a large, global community of independent testers who can detect unexpected bugs and glitches, otherwise known as crowdsourced testing or “crowdtesting,” from a variety of perspectives.
Adversarial testing takes on AI
Relying on AI to execute tasks independently, without human oversight, can result in serious consequences if mistakes are made. This is especially true in high-risk sectors like healthcare – faulty AI agents could endanger users and have broader societal implications. Training LLMs using internet-sourced data is unrealistic, since it’s possible that misinformation or conflicting content could lead to the AI reaching its own conclusions and interpretations. Organisations need a proactive robust testing strategy that will mitigate risk and improve the quality, reliability and success rates of AI agents.
This leads us to red teaming, which is an approach that’s more commonly associated with cybersecurity where it’s used to identify information security vulnerabilities. Red teaming involves a team of experts who execute a series of tests to find cracks in security defences that hackers might be able to exploit. It’s a systematic adversarial technique designed to find points of failure. However, it’s not limited to security and has been adopted for generative AI because it also has weaknesses that can be difficult to identify through regular testing methods.
Domain experts make the difference
Specialists are brought onto red teaming projects for their deeper knowledge of specific subjects. That means sourcing testers qualified in law, history, sociology, ethics, physics or computer science – practically any subject a domain-specific generative AI model might produce. For example, ChatGPT can talk about many subjects, so red teaming based on demographic characteristics might make sense. On the other hand, a banking app would require a mixture of financial services experts and generalists with demographic diversity as part of its red teaming solution.
We can speak from experience. Applause was approached by a global technology leader that wanted to fortify its chatbot against adversarial prompts designed to elicit harmful responses. Recognising the need for specialised expertise, we assembled a red team of experts with deep knowledge in chemical and biological materials. The team generated extensive datasets, encompassing both offensive prompts and safe, appropriate responses to act as a point of comparison. These datasets were then used to train the chatbot to identify potentially dangerous usage patterns and respond responsibly.
Raising the bar for agentic AI quality
Organisations are investing heavily in agentic AI, but shiny new applications and features mean little if the LLM is still hallucinating. That’s why they need to invest in building reliable models before they start to think about adding more features to their product roadmaps. Organisations would prefer to spot potential AI safety and ethical issues during the build phase, and certainly before they reach the customer.
Red teaming is a proactive approach that goes beyond basic testing, simulating real-world attacks to reveal hidden biases and vulnerabilities. It enables organisations to implement guardrails, protecting users from harmful content, while demonstrating a commitment to safety and security. With a comprehensive red teaming approach, organisations can feel confident in the reliability and safety of their agentic AI systems.