JAMIA Journal Club Webinar - December 2024 | AMIA - American Medical Informatics Association

Can GPT-3.5 generate and code discharge summaries?

Presenter

Matúš Falis

Research Fellow

Centre for Clinical Brain Sciences at the University of Edinburgh

Matúš Falis is a Research Fellow in the Centre for Clinical Brain Sciences at the University of Edinburgh and an Associate Natural Language Processing Data Analyst at DataLoch. His current research focuses on identifying response and adverse reaction to antidepressants in patients with treatment-resistant depression using de-dentified general practitioner free-text data curated by DataLoch. This is part of a larger effort towards the discovery of biomarkers for antidepressants in the AMBER (Antidepressant Medications: Biology, Exposure & Response) project.

Prior to his current roles he pursued a PhD in the UKRI Centre for Doctoral Training in Biomedical AI at the School of Informatics at the University of Edinburgh. His thesis focused on addressing concept sparsity within the context of automatic coding of discharge summaries with the International Classification Diseases with the aid of medical ontologies. He was also involved in research within industry as an intern with AstraZeneca (2022) and a full-time research engineer with Canon Medical Research Europe (2017-2019).

When not busy with science, he enjoys swing dancing, playing the piano, and board games. If he never needed to work again, he would channel the extra time and energy into making tea and asking unconventional questions.

Registration

Please note: the registration process recently changed. Upon completion of registering through the AMIA store, you will receive three different emails:

Receipt of purchase
Zoom calendar invite to the LIVE session (sent via Zoom)
Instruction on how to access the course and attend the live webinar (Sent via the AMIA Meetings Team)

Learning Outcomes

After this talk the attendees should be able to understand the strengths and weaknesses of GPT-3.5 in the context of generating synthetic discharge summaries in document coding with International Classification of Diseases, 10th revision (ICD-10); its value as a data generator for smaller artificial neural network models; and the level of its ability to code real discharge summaries produced by GPT-3.5. The attendees can further apply this understanding along with our suggestions of future directions in their own research into generation or automatic coding of discharge summaries.

Statement of Purpose

Medical document coding is the process of assigning labels from a structured label space – a terminology or an ontology, e.g., the International Classification of Diseases (ICD) – to medical documents (e.g., discharge summaries) in order to summarize the concepts relevant to a patient's journey (e.g., conditions, or procedures) as structured data. This process (currently performed by human coders) is laborious, costly, and error-prone. In recent years, efforts have been made towards automating medical document coding with artificial neural network models. These early neural approaches under-utilize the rich information represented within of the ontology (structure, and concept descriptions, connections between ontologies), most notably through treating individual predictions as independent outputs and evaluating predictions as flat. Furthermore, the label spaces within this task are large (in the order of thousands of labels) and follow a big-head long-tail label distribution, giving rise to few-shot and zero-shot scenarios.

In this talk we will investigate the usefulness of a general-domain Large Language Model – GPT-3.5 in the context of ICD coding. The main focus will be on GPT’s ability to generate discharge summaries, their quality (according to the opinions of clinical professionals), and what they can be used for. We will comment on GPT's ability to perform clinical document coding compared to specialized neural network models. We will also briefly touch upon the ideas of ontology-driven hierarchical evaluation for assessing the correctness of a model’s prediction with respect to the structure of the label space) and data augmentation (in order address the data sparsity issue) in automatic ICD coding developed as part of my thesis.

Format

35-minute presentation by article author(s) considering salient features of the published study and its potential impact on practice

25-minute discussion of questions submitted by listeners via the webinar tools and moderated by JAMIA Student Editorial Board members.

CME Credit

The American Medical Informatics Association is accredited by the Accreditation Council for Continuing Medical Education (ACCME) to provide continuing medical education for physicians.

The American Medical Informatics Association designates this live activity for a maximum of 1.0 AMA PRA Category 1™ credits. Physicians should claim only the credit commensurate with the extent of their participation in the activity.

CNE Credit

The American Medical Informatics Association is accredited as a provider of nursing continuing professional development by the American Nurses Credentialing Center's Commission on Accreditation.

Approved Contact Hours: 1.0 total
Nurse Planner: Jenna Thate, PhD, RN, CNE