Data scientists working to deliver on the promise of AI are increasingly faced with the dilemma of how to ensure that AI can stand up to scrutiny, judgment and doubt from businesses, the general public and regulators alike.
Scepticism around the practical and responsible use of artificial intelligence has been fuelled by an abundance of reports regarding AI incidents, as well as bias and privacy issues. The general lack of transparency and explainability observed in AI solutions has not helped build trust in AI. The solution lies in making trust a focal point of the entire end-to-end workflow of any AI project. Trust can be thought of with three pillars in mind: Performance, Operations, and Ethics.
When we think about the performance of a model, trust requires a keen focus on potential points of failure – from data preprocessing to model building and deployment – that could negatively impact and potentially derail decision-making within an organisation.
At the heart of this holistic approach to implementing Trusted AI is understanding how to maximise our model performance. This means asking: “How well can the model use historic data to make predictions on new data?” Model accuracy is the most commonly discussed dimension of performance but the ability to trust AI predictions is also dependent on data quality, preprocessing techniques, an understanding of model errors and different accuracy metrics, and the speed with which it can make predictions.
We’ve identified key steps that can be taken to maximise AI performance.
Track data origins: It’s critical to identify and understand the different data sources that are being combined for use by the AI project. This can help identify problems like incompatible data and poor data collection methodologies before they cause real-world failures.
Institute rigorous data cleaning: You can draw meaningful insights about data by computing summary statistics on each feature, calculating feature correlations with both the target and with other features, and even modifying the data. Before any modeling is done, be sure to use techniques like imputing missing values, dropping duplicate rows, and removing ‘leaky’ features that reveal data not available at the time of prediction.
Build a repeatable pipeline for modeling: Data preprocessing techniques used during model training must be built into the same repeatable pipeline that is used for the model’s predictions – each time the model receives new data, it has to perform all necessary data cleaning again. This will ensure that the model will not break as soon as it’s deployed into the real world.
Understanding accuracy metrics: Accuracy is the most commonly analysed component of performance, but there are many different ways to measure it. Be sure to use out-of-sample testing and cross-validation when evaluating your model. It’s also vital to use an error metric well-suited for the problem at hand – for instance, Log Loss and Root Mean Square Error (RMSE) are the defaults for binary classification and regression problems, respectively. There are however cases when less common accuracy metrics may be required.
Evaluate errors: Digging deeper into model accuracy with insights like the Confusion Matrix can help you to understand what kinds of errors your model is most likely to make, such as ‘false positives’ vs. ‘false negatives’. Lift charts and Receiver Operating Characteristic (ROC) curves can help your understanding as well. For example, by using a lift chart you could observe how close your predictions are to actuals at different probability levels.
Tradeoff between accuracy and speed: Every model, regardless of its function, has some limitations around the speed of prediction – whether that’s three milliseconds, three seconds, or three weeks. Often the most accurate model is also the slowest, such as complex blenders and deep neural networks. Optimising purely for accuracy can lead to model failures along other dimensions, such as cost, explainability, and most relevant to performance, speed. Before selecting a model, you have to find the right level of tradeoff between accuracy and speed.
We’ve established that performance is the first pillar of successfully implementing trustworthy enterprise AI but it does not exist in isolation. Just as important is Operations: ‘How reliable is the system that the model is deployed on?’ – and Ethics: ‘Does the model align with the ethics and values of the organisation?’
Trusted AI needs all three pillars to be securely in place to thrive, brought together as Machine Learning Operations (MLOps). Get it right and models deployed in the real world can be continuously monitored to ensure there is no degradation and that they continue to deliver business value and have a positive impact on the organisation.