Technologies are neutral; the problem tends to be with how they are deployed and used.
One of the big promises of AI is that it will help deliver better healthcare outcomes and lower the costs of doing so. In fact, this was one of the central planks of the UK Prime Minister’s “blueprint to turbocharge AI“. It is being used in hospitals up and down the country to deliver better, faster, and smarter care: spotting pain levels for people who can’t speak, diagnosing breast cancer earlier and more quickly, and getting people discharged faster. This is already helping deliver the government’s mission to build an NHS fit for the future.
Unfortunately, technology cannot simply be dropped into a business process without considering the human factors around how it is to be used. A recent study indicates that “radiologists and other physicians may rely too much on artificial intelligence (AI) when it points out a specific area of interest in an X-ray”.
The confidence with which AI systems deliver diagnoses may blind even expert humans to their mistakes, whether false negatives — failing to highlight something bad on the X-ray — or false positives — highlighting something which is not actually a problem, leading to unnecessarily invasive therapy or surgery.
The reason for this effect is well-known from other fields such as self-driving cars, or earlier, from the deployment of “autopilot” systems in airliners. There is a certain level of cognitive load involved for humans in switching between tasks and modes of operation. In self-driving cars, the issue manifests when you are being chauffeured serenely along the road by AI, and suddenly something happens that the AI can’t handle. This is called a “disengagement”: the AI ceases to operate the vehicle, and the human driver needs to take over.
A disengagement can happen pretty quickly, and if you are not paying attention, bad things can happen. You need to Orient yourself to the situation, Observe whatever the problem was that the AI couldn’t handle on its own, Decide what to do about it, and Act upon your decision. This is called the OODA loop, and trying to run through it cold when you were not expecting to is a recipe for disaster.
A tragic example of the difficulties that even trained experts have had in quickly understanding a situation and reacting correctly to it is the case of Air France flight 447. An inconsistency between sensor measurements caused the autopilot to disengage mid-cruise, and misunderstandings and miscommunication among the crew ended up causing the aircraft to stall and fall out of the sky, killing everyone aboard.
The same mechanism applies to a radiologist reviewing X-rays. Most of them — almost all of them — are routine: either the scan is obviously clear, with nothing unexpected, or the AI has flagged the obvious issue. After reviewing a few images like this, attention to the unexpected will inevitably flag, potentially leading radiologists to miss issues that were not flagged by the AI.
This is exactly what was found in a recent study on the effects of AI assistance on diagnostic performance. The results showed that reviewers were more likely to align their diagnostic decision with AI advice and underwent a shorter period of consideration when AI provided local explanations. When the AI advice was correct, the average diagnostic accuracy among reviewers was 92.8% with local explanations and 85.3% with global explanations. When AI advice was incorrect, physician accuracy was 23.6% with local and 26.1% with global explanations. “When provided local explanations, both radiologists and non-radiologists in the study tended to trust the AI diagnosis more quickly, regardless of the accuracy of AI advice,” Dr. Yi said.
This finding needs to be taken to heart when designing AI-assisted systems, whether for life-or-death situations like the ones described above, but also for more routine and lower-stakes use cases. What is required is a whole-system design, including Cognitive System Engineering as a structural part of the service’s design to ensure that the human effects are taken into account.
In the clinical example, this is particularly important if the full promised benefits of such systems are to be delivered. The expectation is that a “doctor in a box” capable of performing such diagnostic tasks can be deployed in remote areas where a trained radiologist may not be available — but it is precisely in such remote areas where any failure on the part of the AI diagnostics would be hardest to catch and remedy. In well-connected areas, such systems may also be deployed as a cost-cutting measure, again risking the health of lower-income patients in particular if the diagnoses are incorrect.
A general rule of thumb is always to avoid over-promising more than the system can actually deliver. A self-driving car that only works sometimes is actually more dangerous because of the false confidence it induces. Drivers may feel confident in focusing on other tasks, and struggle to take control when a disengagement does occur. In the same way, clinicians may struggle to identify errors in the automated diagnostics.
What cognitive blind spots might your service design be introducing? If there is anything to have learned from this, it’s don’t rush to deploy the technology, no matter how exciting and promising it may seem. Instead, design the full service to take advantage of useful technology, whether it be a faster diagnosis, a self-driving car, or whatever will make your users’ lives better, while avoiding negative outcomes from the inevitable exceptions and corner-cases.