Artificial intelligence may be as effective as health professionals at diagnosing disease, according to research published in international journal The Lancet Digital Health.
Artificial intelligence (AI) appears to detect diseases from medical imaging with similar levels of accuracy as healthcare professionals, the international systematic review and meta-analysis found.
Deep learning, or the true diagnostic power of the AI technique, which involves the use of algorithms, big data and computing power to emulate human learning and intelligence, remains uncertain because of the lack of studies that directly compare humans and machines, or validate AI’s performance in real clinical environments.
“With deep learning, computers can examine thousands of medical images to identify patterns of disease. This offers enormous potential for improving the accuracy and speed of diagnosis,” lead researcher University Hospitals Birmingham NHS Foundation Trust UK Professor Alastair Denniston said.
The researchers conducted a systematic review and meta-analysis of all studies that compared the performance of deep learning models and health professionals in detecting disease from medical imaging published between January 2012 and June 2019. They also evaluated study design, reporting and clinical value.
In total, 82 articles were included in the systematic review; with 69 identified as having enough data to calculate performance accurately.
Analysis of data from 14 studies that compared the performance of deep learning with humans, found that AI could correctly detect disease in 87% of cases compared to 86% of healthcare professionals.
Artificial intelligence algorithms were also able to accurately exclude patients from diagnosis of a disease (93%) as effectively as healthcare professionals (91%).
“Within those handful of high-quality studies, we found that deep learning could indeed detect diseases ranging from cancers to eye diseases as accurately as health professionals. But it’s important to note that AI did not substantially outperform human diagnosis,” Professor Denniston said.
More than 30 AI algorithms for healthcare have already been approved by the US Food and Drug Administration.
Despite strong public interest and market forces driving the rapid development of these technologies, concerns have been raised about whether study designs are biased in favour of machine learning and how applicable they are to real world practice.
AI was often assessed in isolation in a way that did not reflect clinical practice, the researchers found. Few studies were done in real clinical environments.
“There is an inherent tension between the desire to use new, potentially life-saving diagnostics and the imperative to develop high-quality evidence in a way that can benefit patients and health systems in clinical practice,” University of Birmingham, UK’s Dr Xiaoxuan Liu said.
“A key lesson from our work is that in AI – as with any other part of healthcare – good study design matters. Without it, you can easily introduce bias which skews your results. These biases can lead to exaggerated claims of good performance for AI tools which do not translate into the real world.
“Good design and reporting of these studies is a key part of ensuring that the AI interventions that come through to patients are safe and effective.”
Moorfields Eye Hospital in London, Dr Livia Faes said evidence of how AI could change patient outcomes was required.
“So far, there are hardly any such trials where diagnostic decisions made by an AI algorithm are acted on to see what then happens to outcomes which really matter to patients, like timely treatment, time to discharge from hospital, or even survival rates.”