An artificial intelligence program created explanations of heart test results that were in most cases accurate, relevant, and easy to understand by patients, a new study finds.
The study addressed the echocardiogram, which uses sound waves to create pictures of blood flowing through the heart’s chambers and valves. Echocardiogram reports include machine-generated numerical measures of function, as well as comments from the interpreting cardiologist on the heart’s size, the pressure in its vessels, and tissue thickness, which can signal the presence of disease. In the form typically generated by doctors, the reports are difficult for patients to understand, often resulting in unnecessary worry, say the study authors.
To address the issue, ٺƵ Health has been testing the capabilities of a form of artificial intelligence (AI) that generates likely options for the next word in any sentence based on how people use words in context on the internet. A result of this next-word prediction is that such generative AI “chatbots” can reply to questions in simple language. However, AI programs—which work based on probabilities instead of actually thinking and may produce inaccurate summaries—are meant to assist, not replace, human providers.
In March 2023, ٺƵ requested from OpenAI, the company that created the chatGPT chatbot, access to the company’s latest generative AI tool, GPT-4. ٺƵ licensed one of the first “private instances” of the tool, which freed clinicians to experiment with AI using real patient data while also adhering to privacy rules.
Coming out of that effort and , the current study analyzed 100 doctor-written reports on a common type of echocardiogram test to see whether GPT-4 could efficiently generate human-friendly explanations of test results. Five board-certified echocardiographers evaluated AI-generated echo explanations on five-point scales for accuracy, relevance, and understandability, and either agreed or strongly agreed that 73 percent of the reports were suitable to send to patients without any changes.
All AI explanations were rated either “all true” (84 percent) or mostly correct (16 percent). In terms of relevance, 76 percent of explanations were judged to contain “all of the important information,” 15 percent “most of it,” 7 percent “about half,” and 2 percent “less than half.” None of the explanations with missing information were rated as “potentially dangerous,” the authors say.
“Our study, the first to evaluate GPT-4 in this way, shows that generative AI models can be effective in helping clinicians to explain echocardiogram results to patients,” said corresponding author Lior Jankelson, MD, PhD, an associate professor in the at NYU Grossman School of Medicine and an artificial intelligence leader for the department’s . “Fast, accurate explanations may lessen patient worry and reduce the sometimes-overwhelming volume of patient messages to clinicians.”
The federal mandate for the immediate release of test results to patients through the 21st Century Cures Act in 2016 has been linked to dramatic increases in number of inquiries to clinicians, say the study authors. Patients receive raw test results, do not understand them, and grow anxious while they wait for clinicians to reach them with explanations, the researchers say.
Ideally, clinicians would advise patients about their echocardiogram results the instant they are released, but that is delayed as providers struggle to manually enter large amounts of related information into the electronic health record. “If dependable enough, AI tools could help clinicians explain results at the moment they are released,” said first study author Jacob Martin, MD, a cardiology fellow at ٺƵ. “Our plan moving forward is to measure the impact of explanations drafted by AI and refined by clinicians on patient anxiety, satisfaction, and clinician workload.”
The new study also found that 16 percent of the AI explanations contained inaccurate information. In one error, the AI echocardiogram report stated that “a small amount of fluid, known as a pleural effusion, is present in the space surrounding your right lung.” The tool mistakenly concluded that the effusion was small, an error known in the industry as an AI hallucination. The researchers emphasized that human oversight is important to refine drafts from AI, including correcting any inaccuracies before they reach patients.
To get the perspective of lay people on the clarity of AI explanations, the research team also surveyed participants without clinical backgrounds. In short, the reports were well received, said the authors. Nonclinical participants found 97 percent of AI-generated rewrites more understandable than the original reports, which reduced worry in many cases.
“This added analysis underscores the potential of AI to improve patient understanding and ease anxiety,” Dr. Martin added. “Our next step will be to integrate these refined tools into clinical practice to enhance patient care and reduce clinician workload.”
Along with Dr. Martin and Dr. Jankelson, ٺƵ study authors in the Leon H. Charney Division of Cardiology were Muhamed Saric, MD, PhD; Alan F. Vainrib, MD; Daniel Bamira, MD; Samuel Bernard, MD; Richard Ro, MD; Theodore Hill; and Larry A. Chinitz MD. Additional ٺƵ study authors were Jonathan S. Austrian, MD, and , in the Medical Center Information Technology (MCIT); Hao Zang and Vidya Koesmahargyo in the in the ; and Mathew R. Williams, MD, in the .
Media Inquiries
Greg Williams
Phone: 212-404-3500
Gregory.Williams@NYULangone.org