From personal assistants to facial recognition, artificial intelligence (AI) has integrated deeply into personal and professional lives. Over 18 million people in the UK have used generative AI services such as ChatGPT for editing and proofreading content, or Midjourney for generating visuals, demonstrating how AI can boost speed and efficiency by reducing manual tasks.

However, AI systems can exhibit biased behaviour towards end-users. Uber Eats and Google’s Gemini have recently found just how much the use of AI can jeopardise the legitimacy and reputation of their online services. But, it shouldn’t be forgotten that humans are also susceptible to biases. This is evident in facial recognition technology, where the tendency to better recognise members of one’s ethnic group (OGB), is a well-documented phenomenon.

The challenge is evident. Online services have become essential to the economy, with a significant increase in usage—60% more than pre-pandemic levels. AI offers businesses a cost-effective and efficient solution for managing growing customer volumes. Yet, despite all the advantages, it is crucial to acknowledge the biases AI presents. Businesses bear the responsibility of implementing safeguards to not only uphold their reputation but also protect the wider economy.

To combat bias effectively, a strategy must focus on four key elements – identifying and measuring bias, recognising hidden variables and hasty conclusions, devising rigorous training methods, and tailoring solutions to the specific use case.

Core element 1: Detecting and evaluating bias

The battle against bias starts by implementing robust processes for its measurement. AI biases frequently lurk within extensive datasets, becoming apparent only after untangling several correlated variables.

It is therefore crucial for companies using AI to establish good practices such as measurement by confidence interval, the use of datasets of appropriate size and variety, and the employment of appropriate statistical tools manipulated by competent persons.

These companies must also strive to be as transparent as possible about these biases, for example, by publishing public reports based on real production data rather than synthetic or test data.

Public benchmarking tools such as the NIST FRVT (Face Recognition Vendor Test) also produce bias analyses that can be exploited by these companies to communicate about their bias and reduce this bias in their systems.

Based on these observations, companies can understand where biases are most likely to occur in the customer journey and work to find a solution – often by training the algorithms with more complete datasets to produce fairer results. This lays the foundation for rigorous bias treatment and increases the value of the algorithm and its user journey.

Core element 2: Beware of hidden variables and rushed conclusions

The bias of an AI system is often hidden in multiple correlated variables. Let’s take the example of facial recognition between biometrics and identity documents (“face matching”). This step is key in the user’s identity verification.

A first analysis shows that this recognition’s performance is less good for people with dark skin colour than for an average person. Given these conditions, it is tempting to conclude that the system penalises people with dark skin by design.

However, by pushing the analysis further, we observe that the proportion of people with dark skin is higher in African countries than in the rest of the world. Moreover, these African countries use, on average, identity documents of lower quality than those observed in the rest of the world.

This decrease in document quality explains most of the relatively poor performance of facial recognition. Indeed, if we measure the performance of facial recognition for people with dark skin, restricting ourselves to European countries that use higher-quality documents, we find that the bias practically disappears.

In statistical language, we say that the variables “document quality” and “country of origin” are confounding concerning the variable “skin colour.”

We provide this example not to convince that algorithms are not biased (they are) but to emphasise that bias measurement is complex and prone to hasty but incorrect conclusions.

Therefore, it is crucial to conduct a comprehensive bias analysis and study all the hidden variables that may influence the bias.

Core element 3: Establish robust training methodologies

The training phase of an AI model offers the best opportunity to reduce its biases. It is indeed difficult to compensate for this bias afterward without resorting to ad-hoc methods that are not robust.

The datasets used for learning are the main levers through which we can influence learning. By correcting the imbalances in the datasets, we can significantly influence the behaviour of the model.

Let’s take an example. Some online services may be used more frequently by a person of a given gender. If we train a model on a uniform sample of the production data, this model will probably behave more robustly on the majority gender, to the detriment of the minority gender, which will see the model behave more randomly.

We can correct this bias by sampling the data of each gender equally. This will probably result in a relative reduction in performance for the majority gender but to the benefit of the minority gender. For a critical service (such as an application acceptance service for higher education), this balancing of the data makes perfect sense and is easy to implement.

Online identity verification is often associated with critical services. This verification, which often involves biometrics, requires the design of robust training methods that reduce biases as much as possible on the variables exposed to biometrics, namely: age, gender, ethnicity, and country of origin.

Finally, collaboration with regulators, such as the Information Commissioner’s Office (ICO), allows us to step back and think strategically about reducing biases in models.

Core element 4: Customise the solution to fit the specific use case

There is no single measure of bias. In its glossary on model fairness, Google identifies at least three different definitions for fairness, each of which is valid in its own way but leads to very different model behaviours.

How, for example, to choose between “forced” demographic parity and equal opportunity, which takes into account the variables specific to each group?

There is no single answer to this question. Each use case requires its own reflection on the field of application. In the case of identity verification, for example, Onfido uses the “normalized rejection rate” which involves measuring the rejection rate by the system for each group and comparing it to the overall population. A rate greater than 1 corresponds to an over-rejection of the group, while a rate less than 1 corresponds to an under-rejection of the group.

In an ideal world, this normalised rejection rate would be 1 for all groups. In practice, this is not the case for at least two reasons: first, because the datasets necessary to achieve this objective are not necessarily available; and second, because certain confounding variables are not within Onfido’s control (this is the case, for example, with the quality of identity documents mentioned in the example above).

Aiming for perfection impedes progress

While complete bias elimination may not be feasible, continual measurement and transparent communication about the system’s limitations are essential.

Research on bias is widely accessible, with numerous publications on the topic available. Major companies such as Google and Meta continue to contribute significantly to this knowledge by publishing in-depth technical articles, accessible articles, and training materials, as well as dedicated datasets for bias analysis. For instance, last year Meta’s release of the Conversational Dataset, focused on bias analysis in models.

Despite the challenges it presents, embracing AI innovation remains crucial for enhancing digital services, provided biases are effectively managed. By implementing robust bias mitigation measures, companies can enhance customer experiences, foster technological adaptability, and build trust with the communities they serve.

Author

Olivier Koch

Olivier Koch heads up the machine learning and MLOps teams at Onfido, where they usecomputer vision and deep learning for online identity verification. Prior to this, Olivier led the machine learning team for recommendation at Criteo, and computer vision team at Thales Optronics in France. He graduated with a PhD at MIT in 2010 under the supervision of Prof. Seth Teller
View all posts

Olivier Koch July 24, 2024

5 minutes read

Core element 1: Detecting and evaluating bias

Core element 2: Beware of hidden variables and rushed conclusions

Core element 3: Establish robust training methodologies

Core element 4: Customise the solution to fit the specific use case

Aiming for perfection impedes progress

Author

Related Articles

Generative AI and Workforce Readiness

AI in Sales: How to Use AI to Shorten Your Sales Cycle

AI preparedness in the workplace: How to Bridge the skills gap

Use of AI-based technologies in personnel recruitment