Automated risk assessment of newly detected atrial fibrillation poststroke from electronic health record data using machine learning and natural language processing

被引:3
|
作者
Sung, Sheng-Feng [1 ,2 ]
Sung, Kuan-Lin [3 ]
Pan, Ru-Chiou [4 ]
Lee, Pei-Ju [5 ,6 ]
Hu, Ya-Han [7 ]
机构
[1] Ditmanson Med Fdn, Dept Internal Med, Div Neurol, Chiayi Christian Hosp, Chiayi, Taiwan
[2] Min Hwei Jr Coll Hlth Care Management, Dept Nursing, Tainan, Taiwan
[3] Natl Taiwan Univ, Sch Med, Taipei, Taiwan
[4] Ditmanson Med Fdn, Clin Data Ctr, Chiayi Christian Hosp, Dept Med Res, Chiayi, Taiwan
[5] Natl Chung Cheng Univ, Dept Informat Management, Minxiong Township, Chiayi County, Taiwan
[6] Natl Chung Cheng Univ, Inst Healthcare Informat Management, Minxiong Township, Chiayi County, Taiwan
[7] Natl Cent Univ, Dept Informat Management, Taoyuan, Taiwan
来源
FRONTIERS IN CARDIOVASCULAR MEDICINE | 2022年 / 9卷
关键词
atrial fibrillation; electronic health records; ischemic stroke; natural language processing; prediction; TRANSIENT ISCHEMIC ATTACK; TEXT CLASSIFICATION; FEATURE-SELECTION; VASCULAR EVENTS; STROKE CARE; SCORE; VALIDATION; RECURRENCE; PREDICTION; TAIWAN;
D O I
10.3389/fcvm.2022.941237
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
BackgroundTimely detection of atrial fibrillation (AF) after stroke is highly clinically relevant, aiding decisions on the optimal strategies for secondary prevention of stroke. In the context of limited medical resources, it is crucial to set the right priorities of extended heart rhythm monitoring by stratifying patients into different risk groups likely to have newly detected AF (NDAF). This study aimed to develop an electronic health record (EHR)-based machine learning model to assess the risk of NDAF in an early stage after stroke. MethodsLinked data between a hospital stroke registry and a deidentified research-based database including EHRs and administrative claims data was used. Demographic features, physiological measurements, routine laboratory results, and clinical free text were extracted from EHRs. The extreme gradient boosting algorithm was used to build the prediction model. The prediction performance was evaluated by the C-index and was compared to that of the AS5F and CHASE-LESS scores. ResultsThe study population consisted of a training set of 4,064 and a temporal test set of 1,492 patients. During a median follow-up of 10.2 months, the incidence rate of NDAF was 87.0 per 1,000 person-year in the test set. On the test set, the model based on both structured and unstructured data achieved a C-index of 0.840, which was significantly higher than those of the AS5F (0.779, p = 0.023) and CHASE-LESS (0.768, p = 0.005) scores. ConclusionsIt is feasible to build a machine learning model to assess the risk of NDAF based on EHR data available at the time of hospital admission. Inclusion of information derived from clinical free text can significantly improve the model performance and may outperform risk scores developed using traditional statistical methods. Further studies are needed to assess the clinical usefulness of the prediction model.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] Extraction of Geriatric Syndromes From Electronic Health Record Clinical Notes: Assessment of Statistical Natural Language Processing Methods
    Chen, Tao
    Dredze, Mark
    Weiner, Jonathan P.
    Hernandez, Leilani
    Kimura, Joe
    Kharrazi, Hadi
    JMIR MEDICAL INFORMATICS, 2019, 7 (01)
  • [42] Automated Research Review Support Using Machine Learning, Large Language Models, and Natural Language Processing
    Pendyala, Vishnu S.
    Kamdar, Karnavee
    Mulchandani, Kapil
    ELECTRONICS, 2025, 14 (02):
  • [43] Machine learning prediction of atrial fibrillation in cardiovascular patients using cardiac magnetic resonance and electronic health information
    Dykstra, Steven
    Satriano, Alessandro
    Cornhill, Aidan K.
    Lei, Lucy Y.
    Labib, Dina
    Mikami, Yoko
    Flewitt, Jacqueline
    Rivest, Sandra
    Sandonato, Rosa
    Feuchter, Patricia
    Howarth, Andrew G.
    Lydell, Carmen P.
    Fine, Nowell M.
    Exner, Derek V.
    Morillo, Carlos A.
    Wilton, Stephen B.
    Gavrilova, Marina L.
    White, James A.
    FRONTIERS IN CARDIOVASCULAR MEDICINE, 2022, 9
  • [44] An automated machine learning-based model predicts postoperative mortality using readily-extractable preoperative electronic health record data
    Hill, Brian L.
    Brown, Robert
    Gabel, Eilon
    Rakocz, Nadav
    Lee, Christine
    Cannesson, Maxime
    Baldi, Pierre
    Loohuis, Loes Olde
    Johnson, Ruth
    Jew, Brandon
    Maoz, Uri
    Mahajan, Aman
    Sankararaman, Sriram
    Hofer, Ira
    Halperin, Eran
    BRITISH JOURNAL OF ANAESTHESIA, 2019, 123 (06) : 877 - 886
  • [45] Statistical inference for natural language processing algorithms with a demonstration using type 2 diabetes prediction from electronic health record notes
    Egleston, Brian L.
    Bai, Tian
    Bleicher, Richard J.
    Taylor, Stanford J.
    Lutz, Michael H.
    Vucetic, Slobodan
    BIOMETRICS, 2021, 77 (03) : 1089 - 1100
  • [46] Assessing stroke severity using electronic health record data: a machine learning approach
    Kogan, Emily
    Twyman, Kathryn
    Heap, Jesse
    Milentijevic, Dejan
    Lin, Jennifer H.
    Alberts, Mark
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2020, 20 (01)
  • [47] Preoperative Prediction of Postoperative Infections Using Machine Learning and Electronic Health Record Data
    Zhuang, Yaxu
    Dyas, Adam
    Meguid, Robert A.
    Henderson, William G.
    Bronsert, Michael
    Madsen, Helen
    Colborn, Kathryn L.
    ANNALS OF SURGERY, 2024, 279 (04) : 720 - 726
  • [48] Reconciling Allergy Information in the Electronic Health Record After a Drug Challenge Using Natural Language Processing
    Lo, Ying-Chih
    Varghese, Sheril
    Blackley, Suzanne
    Seger, Diane L. L.
    Blumenthal, Kimberly G. G.
    Goss, Foster R. R.
    Zhou, Li
    FRONTIERS IN ALLERGY, 2022, 3
  • [49] Automated derivation of diagnostic criteria for lung cancer using natural language processing on electronic health records: a pilot study
    Houston, Andrew
    Williams, Sophie
    Ricketts, William
    Gutteridge, Charles
    Tackaberry, Chris
    Conibear, John
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2024, 24 (01)
  • [50] Development and Validation of an Electronic Health Record-Based Machine Learning Model to Estimate Delirium Risk in Newly Hospitalized Patients Without Known Cognitive Impairment
    Wong, Andrew
    Young, Albert T.
    Liang, April S.
    Gonzales, Ralph
    Douglas, Vanja C.
    Hadley, Dexter
    JAMA NETWORK OPEN, 2018, 1 (04) : e181018