Artificial intelligence (AI) and machine learning (ML) power mission-critical elements of countless businesses each day. Whether focused on improving customers’ experiences or achieving impressive ROI, the benefits to organisations of all sizes are clear. But just as the complexity of AI use cases increases, so too do the expectations of end users. To succeed, every business – from tech start-ups at the cutting-edge of autonomous vehicles to financial institutions modernising traditional IVRs – needs to develop a strategy to train, test, and validate their AI- and ML-powered experiences. And since these systems will ultimately be used by real people, perfecting them similarly requires a human touch.
The Essential AI Quality Cycle
The aggressive pursuit of AI adoption is unsurprising – 56 percent of organisations have already introduced AI into at least one business function, and 85 percent of executives in capital-intensive businesses say they won’t achieve their growth objectives without scaling AI.
But what these organisations often don’t realise is that, in contrast to traditional software development, considerably more time must be spent on training, testing, and validating these products than coding them. That difference is critical; being inherently non-deterministic, AI requires a different approach to quality improvement than we’ve used before. Fortunately, it’s simple to understand – it just requires real-world solutions for collecting training and testing data and validating the end-to-end experience.
You Reap What You Sow
AI applications learn by example – we “train” them by providing huge volumes of data to gain insight not explicitly apparent from individual artefacts. AI applications often require many thousands of examples to successfully return correct results under real-world usage, and the notion of the “unreasonable effectiveness of data” means the success of the end product is often dictated by the volume of this training data. Many companies try to source this data with in-house or friends-and-family communities, but for even the largest enterprises this is a struggle – in fact, over 96 percent of companies report such challenges.
Even companies that do have an in-house team available to source the necessary amount of data run into another AI training pitfall – volume may be vital, but quality is key. Essentially, even the best designed, most thoroughly trained systems only perform as well as the quality and diversity of the training data. Imagine a healthcare company developing a voice-enabled hospital room monitor. Training such a device with audio from hundreds of developers in a quiet office won’t make it work once deployed in the real world; that requires audio of real patients in real situations – various accents, with myriad physical impairments and emotional states, and with a range of background noises.
This is why canned datasets from even the most sophisticated data brokers, as well as synthetic, generated datasets, still fall flat – AI-powered products can only effectively learn through large volumes of high-quality AND accurate data tailored to specific use cases. This is also why you frequently see highly publicised accounts of significant and often embarrassing bias in such products – although training data should be broadly representative, rarely is it so in practice.
Another fundamental and often overlooked aspect of data collection is the need for testing data. While not wholly representative of the system’s ultimate performance in the real world, reserving some portion of the collected data just for testing provides an objective way for developers to rapidly measure the component-level performance of the system as it evolves, and as new training data is introduced.
Can You Hear Me Now?
How often have you called your bank, only for a virtual agent to ask how it can help, but not understand you? This ubiquitous, frustrating experience reflects a gap in training data, but the lack of awareness of the issue by most businesses also reflects a lack of validation of the end user experience – ultimately the most important determinant of these initiatives’ success. It might seem reductive, but the only way to know how your users feel about your product is… to test it with your users.
That’s why successful organisations embrace the crowd – they recognise the unreasonable effectiveness of their customers’ feedback and experience. User behaviour changes rapidly; a decade ago no one knew what a smart speaker was, whereas now they’re actively used by over a third of US households. Companies therefore need to regularly iterate to ensure their products remain in-sync with their users.
AI is one of the clearest modern differentiators for businesses seeking to stay competitive. Building web and mobile apps is widely understood and is a core capability of most companies today, but the same can’t be said for building for AI. What differentiates legacy companies from those currently leading the market is their recognition of the importance of the end-user experience – and in driving to a good experience, their recognition of the need for real-world training, testing, and validation.