Deep learning-based natural language processing for detecting medical symptoms and histories in emergency patient triage

被引:6
作者
Lee, Siryeol [1 ]
Lee, Juncheol [2 ]
Park, Juntae [3 ]
Park, Jiwoo [4 ]
Kim, Dohoon [5 ]
Lee, Joohyun [1 ,3 ,6 ]
Oh, Jaehoon [2 ,7 ]
机构
[1] Hanyang Univ, Dept Appl Arti ficial Intelligence, ERICA, Ansan, South Korea
[2] Hanyang Univ, Coll Med, Dept Emergency Med, Seoul, South Korea
[3] Hanyang Univ, Sch Elect Engn, ERICA, Ansan, South Korea
[4] Hanyang Univ Hosp, Dept Emergency Med, Seoul, South Korea
[5] Hanyang Univ, Dept Translat Med Biomed Sci & Engn, Seoul, South Korea
[6] Hanyang Univ, Sch Elect Engn, 55 Hanyangdaehak Ro, Ansan 15588, South Korea
[7] Hanyang Univ, Coll Med, Dept Emergency Med, 222-1 Wangsimni Ro, Seoul 04763, South Korea
基金
新加坡国家研究基金会;
关键词
Natural language processing; Electronic health record; Large language models; eXplainable artificial intelligence; Turing test; ACUITY; SCALE;
D O I
10.1016/j.ajem.2023.11.063
中图分类号
R4 [临床医学];
学科分类号
1002 ; 100602 ;
摘要
Objective: The manual recording of electronic health records (EHRs) by clinicians in the emergency department (ED) is time-consuming and challenging. In light of recent advancements in large language models (LLMs) such as GPT and BERT, this study aimed to design and validate LLMs for automatic clinical diagnoses. The models were designed to identify 12 medical symptoms and 2 patient histories from simulated clinician-patient conversations within 6 primary symptom scenarios in emergency triage rooms.Materials and method: We developed classification models by fine-tuning BERT, a transformer-based pre-trained model. We subsequently analyzed these models using eXplainable artificial intelligence (XAI) and the Shapley additive explanation (SHAP) method. A Turing test was conducted to ascertain the reliability of the XAI results by comparing them to the outcomes of tasks performed and explained by medical workers. An emergency medicine specialist assessed the results of both XAI and the medical workers.Results: We fine-tuned four pre-trained LLMs and compared their classification performance. The KLUERoBERTa-based model demonstrated the highest performance (F1-score: 0.965, AUROC: 0.893) on humantranscribed script data. The XAI results using SHAP showed an average Jaccard similarity of 0.722 when compared with explanations of medical workers for 15 samples. The Turing test results revealed a small 6% gap, with XAI and medical workers receiving the mean scores of 3.327 and 3.52, respectively.Conclusion: This paper highlights the potential of LLMs for automatic EHR recording in Korean EDs. The KLUERoBERTa-based model demonstrated superior classification performance. Furthermore, XAI using SHAP provided reliable explanations for model outputs. The reliability of these explanations was confirmed by a Turing test.(c) 2023 Elsevier Inc. All rights reserved.
引用
收藏
页码:29 / 38
页数:10
相关论文
共 43 条
  • [1] Document clustering of scientific texts using citation contexts
    Aljaber, Bader
    Stokes, Nicola
    Bailey, James
    Pei, Jian
    [J]. INFORMATION RETRIEVAL, 2010, 13 (02): : 101 - 131
  • [2] Explainability for artificial intelligence in healthcare: a multidisciplinary perspective
    Amann, Julia
    Blasimme, Alessandro
    Vayena, Effy
    Frey, Dietmar
    Madai, Vince I.
    [J]. BMC MEDICAL INFORMATICS AND DECISION MAKING, 2020, 20 (01)
  • [3] Aydin O, 2022, Emerging computer technologies, V2, P22, DOI [DOI 10.2139/SSRN.4308687, 10.2139/ssrn.4308687, 10.2139/SSRN.4308687]
  • [4] Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
    Barredo Arrieta, Alejandro
    Diaz-Rodriguez, Natalia
    Del Ser, Javier
    Bennetot, Adrien
    Tabik, Siham
    Barbado, Alberto
    Garcia, Salvador
    Gil-Lopez, Sergio
    Molina, Daniel
    Benjamins, Richard
    Chatila, Raja
    Herrera, Francisco
    [J]. INFORMATION FUSION, 2020, 58 : 82 - 115
  • [5] Biessmann F., 2021, arXiv
  • [6] Blair G., 2022, At least 153 killed in crowd crush during Halloween festivities in Seoul
  • [7] Machine learning-based suggestion for critical interventions in the management of potentially severe conditioned patients in emergency department triage
    Chang, Hansol
    Yu, Jae Yong
    Yoon, Sunyoung
    Kim, Taerim
    Cha, Won Chul
    [J]. SCIENTIFIC REPORTS, 2022, 12 (01)
  • [8] A benchmark for automatic medical consultation system: frameworks, tasks and datasets
    Chen, Wei
    Li, Zhiwei
    Fang, Hongyi
    Yao, Qianyuan
    Zhong, Cheng
    Hao, Jianye
    Zhang, Qi
    Huang, Xuanjing
    Peng, Jiajie
    Wei, Zhongyu
    [J]. BIOINFORMATICS, 2023, 39 (01)
  • [9] Effect of Applying a Real-Time Medical Record Input Assistance System With Voice Artificial Intelligence on Triage Task Performance in the Emergency Department: Prospective Interventional Study
    Cho, Ara
    Min, In Kyung
    Hong, Seungkyun
    Chung, Hyun Soo
    Lee, Hyun Sim
    Kim, Ji Hoon
    [J]. JMIR MEDICAL INFORMATICS, 2022, 10 (08)
  • [10] Prediction of bacteremia at the emergency department during triage and disposition stages using machine learning models
    Choi, Dong Hyun
    Hong, Ki Jeong
    Park, Jeong Ho
    Shin, Sang Do
    Ro, Young Sun
    Song, Kyoung Jun
    Kim, Ki Hong
    Kim, Sungwan
    [J]. AMERICAN JOURNAL OF EMERGENCY MEDICINE, 2022, 53 : 86 - 93