AI Beat Clinicians at Figuring Out Difficult Diagnoses

— GPT-4 may help with diagnoses that have been missed by clinicians, study author says

by Michael DePeau-Wilson, Enterprise & Investigative Writer, 番茄社区 August 14, 2023

A generative artificial intelligence (AI) program diagnosed elderly patients with extensive medical histories and long hospital stays more accurately than clinicians, suggesting the technology could help identify missed diagnoses, according to a new study.

An analysis of medical histories for six patients over the age of 65 with delayed diagnoses revealed that GPT-4 (Generative Pre-trained Transformer 4, made by OpenAI) accurately diagnosed four out of six patients, according to Yat-Fung Shea, MBBS, of the Department of Medicine at Queen Mary Hospital and University of Hong Kong, and coauthors.

By comparison, clinicians accurately diagnosed only two out of six of those same patients, according to a research letter published in .

When differential diagnoses were included, AI's accuracy improved to five out of six patient diagnoses, compared with three out of six correct patient diagnoses made by clinicians.

Differential diagnoses were also generated using a medical diagnostic decision tool known as Isabel DDx Companion. This tool accurately diagnosed none of the patients in the initial attempt, and two out of six patients when provided differential diagnosis information.

"GPT-4 may be able to provide potential diagnoses which have been missed by clinicians," Shea told 番茄社区 in an email. "If a doctor encounters elderly patients, who have been admitted into hospital for work-up for at least a month but [are] still without a definite diagnosis, he/she can consider using GPT-4 to analyze the medical histories."

"GPT-4 may help clinicians to analyze clinical situations with diagnostic difficulties, especially in alerting clinicians to possible underlying malignancies or side effects of drugs," he added.

The AI program was able to successfully diagnose patients as a result of the extensive medical histories available for each of them, Shea said, including radiological and pharmacological information.

Shea noted that they chose to work with older patients because they often suffered from multiple comorbidities, which can require prolonged efforts to achieve a correct diagnosis. With GPT-4, clinicians could potentially identify diagnoses they might have otherwise missed, which would help close the time to initial diagnosis in this population.

The AI program successfully diagnosed patients with a range of conditions, including polymyalgia rheumatica (patient 2), mycobacterium tuberculosis-related hemophagocytic lymphohistiocytosis (patient 3), metronidazole-induced encephalopathy (patient 5), and lymphoma (patient 6).

Still, GPT-4 had trouble with certain aspects of diagnosing patients, including multifocal infections. The AI program failed to pinpoint the source of a recurrent infection in one patient, and it did not suggest the use of clinically relevant testing for infections in most of the patients in the study.

Shea noted that GPT-4 should be seen as a tool that can increase a clinician's confidence in a diagnosis or even offer clinicians suggestions similar to those of a specialist. This would be especially beneficial in lower-income countries that lack the wide availability of specialists to assist with consulting on older patients.

"Our results showed that GPT-4 has the potential to improve clinician responses," Shea said. "GPT-4 may alert clinicians [of] certain overlooked things in the clinical history, e.g. potential side effects of drugs or abnormal findings on imaging. These may be relevant especially when certain subspecialties are not immediately available for consultation."

Shea also noted that the study was limited by a very small sample size. The analysis was conducted using the medical histories for six patients (two women and four men) in a single hospital unit -- the Division of Geriatrics in the Department of Medicine at Queen Mary Hospital. All of the patients had delayed definitive diagnosis for longer than 1 month in 2022. Their histories were entered into GPT-4 chronologically starting at admission, at 1 week after admission, and before final diagnosis. The data were entered into the AI program on April 16, 2023.

The authors also cautioned that the AI program is susceptible to regurgitating wrong information based on incorrect medical histories.

They concluded that use of generative AI in diagnosing patients showed promise, especially when extensive medical histories were available, but it also presented several new challenges for clinicians.

Michael DePeau-Wilson is a reporter on 番茄社区鈥檚 enterprise & investigative team. He covers psychiatry, long covid, and infectious diseases, among other relevant U.S. clinical news.

Disclosures

The authors reported no conflicts of interest.

Primary Source

JAMA Network Open

Shea, Y-F "Use of GPT-4 to analyze medical records of patients with extensive investigations and delayed diagnosis" JAMA Netw Open 2023; DOI:10.1001/jamanetworkopen.2023.25000.