Improving clinical documentation: automatic inference of ICD-10 codes from patient notes using BERT model

被引:2
作者
Al-Bashabsheh, Emran [1 ]
Alaiad, Ahmad [1 ]
Al-Ayyoub, Mahmoud [1 ]
Beni-Yonis, Othman [1 ]
Zitar, Raed Abu [2 ]
Abualigah, Laith [3 ,4 ,5 ,6 ,7 ,8 ]
机构
[1] Jordan Univ Sci & Technol, Dept Comp Informat Syst, Irbid 22110, Jordan
[2] Sorbonne Univ Abu Dhabi, Sorbonne Ctr Artificial Intelligence, 38044, Abu Dhabi, U Arab Emirates
[3] Al al Bayt Univ, Prince Hussein Bin Abdullah Fac Informat Technol, Comp Sci Dept, Mafraq 25113, Jordan
[4] Yuan Ze Univ, Coll Engn, Taoyuan, Taiwan
[5] Al Ahliyya Amman Univ, Hourani Ctr Appl Sci Res, Amman 19328, Jordan
[6] Middle East Univ, Fac Informat Technol, Amman 11831, Jordan
[7] Appl Sci Private Univ, Appl Sci Res Ctr, Amman 11931, Jordan
[8] Univ Sains Malaysia, Sch Comp Sci, George Town 11800, Pulau Pinang, Malaysia
关键词
ICD-10; Deep learning; BERT; Long short-term memory; Convolutional neural network;
D O I
10.1007/s11227-023-05160-z
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Electronic health records provide a vast amount of text health data written by physicians as patient clinical notes. The world health organization released the international classification of diseases version 10 (ICD-10) system to monitor and analyze clinical notes. ICD-10 is system physicians and other healthcare providers use to classify and code all diagnoses and symptom records in conjunction with hospital care. Therefore, the data can be easily stored, retrieved, and analyzed for decision-making. In order to address the problem, this paper introduces a system to classify the clinical notes to ICD-10 codes. This paper examines 7541 clinical notes collected from a health institute in Jordan and annotated by ICD-10's coders. In addition, the research uses another outsource dataset to augment the actual dataset. The research presented many approaches, such as the baseline and pipeline models. The Baseline model employed several methods like Word2vec embedding for representing the text. The model structure also involves long-short-term memory a convolutional neural network, and two fully-connected layers. The second Pipeline approach adopts the transformer model, such as Bidirectional Encoder Representations from Transformers (BERT), which is pre-trained on a similar health domain. The Pipeline model builds on two BERT models. The first model classifies the category codes representing the first three characters of ICD-10. The second BERT model uses the outputs from the general BERT model (first model) as input for the special BERT (second model) to classify the clinical notes into total codes of ICD-10. Moreover, Baseline and Pipeline models applied the Focal loss function to eliminate the imbalanced classes. However, The Pipeline model demonstrates a significant performance by evaluating it over the F1 score, recall, precision, and accuracy metric, which are 92.5%, 84.9%, 91.8%, and 84.97%, respectively.
引用
收藏
页码:12766 / 12790
页数:25
相关论文
共 43 条
  • [1] Agarap AF, ARXIV
  • [2] Alsentzer E, arXiv
  • [3] Diabetes Monitoring System in Smart Health Cities Based on Big Data Intelligence
    AlZu'bi, Shadi
    Elbes, Mohammad
    Mughaid, Ala
    Bdair, Noor
    Abualigah, Laith
    Forestiero, Agostino
    Abu Zitar, Raed
    [J]. FUTURE INTERNET, 2023, 15 (02)
  • [4] Alzubi D, 2022, J HEALTHC ENG
  • [5] Interpretable deep learning to map diagnostic texts to ICD-10 codes
    Atutxa, Aitziber
    Diaz de Ilarraza, Arantza
    Gojenola, Koldo
    Oronoz, Maite
    Perez-de-Vinaspre, Olatz
    [J]. INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2019, 129 : 49 - 59
  • [6] Automatic ICD-10 Classification of Diseases from Dutch Discharge Letters
    Bagheri, Ayoub
    Sammani, Arjan
    Van der Heijden, Peter G. M.
    Asselbergs, Folkert W.
    Oberski, Daniel L.
    [J]. PROCEEDINGS OF THE 13TH INTERNATIONAL JOINT CONFERENCE ON BIOMEDICAL ENGINEERING SYSTEMS AND TECHNOLOGIES, VOL 3: BIOINFORMATICS, 2020, : 281 - 289
  • [7] Natural language processing with deep learning for medical adverse event detection from free-text medical narratives: A case study of detecting total hip replacement dislocation
    Borjali, Alireza
    Magneli, Martin
    Shin, David
    Malchau, Henrik
    Muratoglu, Orhun K.
    Varadarajan, Kartik M.
    [J]. COMPUTERS IN BIOLOGY AND MEDICINE, 2021, 129
  • [8] BrA C etal, 1999, J AM MED INFORM ASSN, V706
  • [9] Strength prediction of a steel pipe having a hemi-ellipsoidal corrosion defect repaired by GFRP composite patch using artificial neural network
    Brahim, Abdelmoumin Oulad
    Belaidi, Idir
    Khatir, Samir
    Thanh, Coung Le
    Mirjalili, Seyedali
    Wahab, Magd Abdel
    [J]. COMPOSITE STRUCTURES, 2023, 304
  • [10] Improving the Electronic Health Record-Are Clinicians Getting What They Wished For?
    Cimino, James J.
    [J]. JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2013, 309 (10): : 991 - 992