Deep learning-based natural language processing for detecting medical symptoms and histories in emergency patient triage

被引：6

作者：

Lee, Siryeol ^{[1
]}

Lee, Juncheol ^{[2
]}

Park, Juntae ^{[3
]}

Park, Jiwoo ^{[4
]}

Kim, Dohoon ^{[5
]}

Lee, Joohyun ^{[1
,3
,6
]}

Oh, Jaehoon ^{[2
,7
]}

机构：

[1] Hanyang Univ, Dept Appl Arti ficial Intelligence, ERICA, Ansan, South Korea

[2] Hanyang Univ, Coll Med, Dept Emergency Med, Seoul, South Korea

[3] Hanyang Univ, Sch Elect Engn, ERICA, Ansan, South Korea

[4] Hanyang Univ Hosp, Dept Emergency Med, Seoul, South Korea

[5] Hanyang Univ, Dept Translat Med Biomed Sci & Engn, Seoul, South Korea

[6] Hanyang Univ, Sch Elect Engn, 55 Hanyangdaehak Ro, Ansan 15588, South Korea

[7] Hanyang Univ, Coll Med, Dept Emergency Med, 222-1 Wangsimni Ro, Seoul 04763, South Korea

来源：

AMERICAN JOURNAL OF EMERGENCY MEDICINE | 2024年 / 77卷

基金：

新加坡国家研究基金会;

关键词：

Natural language processing; Electronic health record; Large language models; eXplainable artificial intelligence; Turing test; ACUITY; SCALE;

D O I：

10.1016/j.ajem.2023.11.063

中图分类号：

R4 [临床医学];

学科分类号：

1002 ; 100602 ;

摘要：

Objective: The manual recording of electronic health records (EHRs) by clinicians in the emergency department (ED) is time-consuming and challenging. In light of recent advancements in large language models (LLMs) such as GPT and BERT, this study aimed to design and validate LLMs for automatic clinical diagnoses. The models were designed to identify 12 medical symptoms and 2 patient histories from simulated clinician-patient conversations within 6 primary symptom scenarios in emergency triage rooms.Materials and method: We developed classification models by fine-tuning BERT, a transformer-based pre-trained model. We subsequently analyzed these models using eXplainable artificial intelligence (XAI) and the Shapley additive explanation (SHAP) method. A Turing test was conducted to ascertain the reliability of the XAI results by comparing them to the outcomes of tasks performed and explained by medical workers. An emergency medicine specialist assessed the results of both XAI and the medical workers.Results: We fine-tuned four pre-trained LLMs and compared their classification performance. The KLUERoBERTa-based model demonstrated the highest performance (F1-score: 0.965, AUROC: 0.893) on humantranscribed script data. The XAI results using SHAP showed an average Jaccard similarity of 0.722 when compared with explanations of medical workers for 15 samples. The Turing test results revealed a small 6% gap, with XAI and medical workers receiving the mean scores of 3.327 and 3.52, respectively.Conclusion: This paper highlights the potential of LLMs for automatic EHR recording in Korean EDs. The KLUERoBERTa-based model demonstrated superior classification performance. Furthermore, XAI using SHAP provided reliable explanations for model outputs. The reliability of these explanations was confirmed by a Turing test.(c) 2023 Elsevier Inc. All rights reserved.

引用

页码：29 / 38

页数：10

共 43 条

[1] Document clustering of scientific texts using citation contexts
Aljaber, Bader
Stokes, Nicola
Bailey, James
Pei, Jian
[J]. INFORMATION RETRIEVAL, 2010, 13 (02): : 101 - 131
[2] Explainability for artificial intelligence in healthcare: a multidisciplinary perspective
Amann, Julia
Blasimme, Alessandro
Vayena, Effy
Frey, Dietmar
Madai, Vince I.
[J]. BMC MEDICAL INFORMATICS AND DECISION MAKING, 2020, 20 (01)
[3] Aydin O, 2022, Emerging computer technologies, V2, P22, DOI [DOI 10.2139/SSRN.4308687, 10.2139/ssrn.4308687, 10.2139/SSRN.4308687]
[4] Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
Barredo Arrieta, Alejandro
Diaz-Rodriguez, Natalia
Del Ser, Javier
Bennetot, Adrien
Tabik, Siham
Barbado, Alberto
Garcia, Salvador
Gil-Lopez, Sergio
Molina, Daniel
Benjamins, Richard
Chatila, Raja
Herrera, Francisco
[J]. INFORMATION FUSION, 2020, 58 : 82 - 115
[5] Biessmann F., 2021, arXiv
[6] Blair G., 2022, At least 153 killed in crowd crush during Halloween festivities in Seoul
[7] Machine learning-based suggestion for critical interventions in the management of potentially severe conditioned patients in emergency department triage
Chang, Hansol
Yu, Jae Yong
Yoon, Sunyoung
Kim, Taerim
Cha, Won Chul
[J]. SCIENTIFIC REPORTS, 2022, 12 (01)
[8] A benchmark for automatic medical consultation system: frameworks, tasks and datasets
Chen, Wei
Li, Zhiwei
Fang, Hongyi
Yao, Qianyuan
Zhong, Cheng
Hao, Jianye
Zhang, Qi
Huang, Xuanjing
Peng, Jiajie
Wei, Zhongyu
[J]. BIOINFORMATICS, 2023, 39 (01)
[9] Effect of Applying a Real-Time Medical Record Input Assistance System With Voice Artificial Intelligence on Triage Task Performance in the Emergency Department: Prospective Interventional Study
Cho, Ara
Min, In Kyung
Hong, Seungkyun
Chung, Hyun Soo
Lee, Hyun Sim
Kim, Ji Hoon
[J]. JMIR MEDICAL INFORMATICS, 2022, 10 (08)
[10] Prediction of bacteremia at the emergency department during triage and disposition stages using machine learning models
Choi, Dong Hyun
Hong, Ki Jeong
Park, Jeong Ho
Shin, Sang Do
Ro, Young Sun
Song, Kyoung Jun
Kim, Ki Hong
Kim, Sungwan
[J]. AMERICAN JOURNAL OF EMERGENCY MEDICINE, 2022, 53 : 86 - 93

← 1 2 3 4 5 →